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ABSTRACT 


This book targets computer scientists and engineers who are familiar with concepts in classi- 
cal computer systems but are curious to learn the general architecture of quantum computing 
systems. It gives a concise presentation of this new paradigm of computing from a computer 
systems’ point of view without assuming any background in quantum mechanics. As such, it is 
divided into two parts. The first part of the book provides a gentle overview on the fundamental 
principles of the quantum theory and their implications for computing. The second part is de- 
voted to state-of-the-art research in designing practical quantum programs, building a scalable 
software systems stack, and controlling quantum hardware components. Most chapters end with 
a summary and an outlook for future directions. This book celebrates the remarkable progress 
that scientists across disciplines have made in the past decades and reveals what roles computer 
scientists and engineers can play to enable practical-scale quantum computing. 
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Preface 


Quantum computing is at a historic time in its development and there is a great need for research 
in quantum computer systems. This book stems from a course we co-taught in 2018 and the re- 
search efforts of the EPiQC NSF Expedition in Computing and others. Our goal is to provide 
a broad overview of some of the emerging research areas in the development of practical com- 
puting systems based upon emerging noisy intermediate-scale quantum hardware. It is our hope 
that this book will encourage other researchers in the computer systems community to pursue 
some of these directions and help accelerate real-world applications of quantum computing. 

Despite the impressive capability of today’s digital computers, there are still some compu- 
tational tasks that are beyond their reach. Remarkably, some of those tasks seem to be relatively 
easy with a quantum computer. Over the past four decades or so, our understanding of the 
theoretical power of quantum and skills in quantum engineering has advanced significantly. 
Small-scale prototypes of progammable quantum computers are emerging from academic and 
industry labs around the world. This is undoubtedly an exciting time, as we may be soon fortunate 
enough to be among the first to witness the application of quantum computers on problems that 
are unfeasible for today’s classical computers. What has been truly remarkable is that the field 
of quantum information science has brought scientists together across disciplines—physicists, 
electrical engineers, computer architects, and theorists, just to name a few. 

Looking back at the historical progress in digital computers, we remark upon the three 
major milestones that led to the integration of millions of computational units that make up the 
computing power in today’s computers: low-cost integrated circuit technology, efficient archi- 
tectural design, and interconnected software ecosystem. It is not too unrealistic to assume that 
the evolution of quantum computers will follow a similar trajectory; we are starting to see some 
innovations in hardware, software, and architecture designs that have the potential to scale up 
well. The progress and prospect of the new paradigm of computing has motivated us to write 
this Synthesis Lecture, which hopefully can bring together more and more computer scientists 
and engineers to join the expedition to practical-scale quantum computation. 

This introduction to quantum computer systems should primarily appeal to computer sys- 
tems researchers, software engineers, and electrical engineers. The focus of this book is on sys- 
tems research for noisy intermediate-scale quantum (NISQ) computers, highlighting the recent 
progress and addressing the near-term challenges for realizing the computational power of QC 
systems. 


xvi PREFACE 
Reading This Book 


The aim of this book is to provide computer systems researchers and engineers with an introduc- 
tory guide to the general principles and challenges in designing practical quantum computing 
systems. Compared to its predecessor in the series, Quantum Computing for Computer Archi- 
tects by Metodi, Faruque, and Chong [1], this book targets near-term progress and prospects of 
quantum computing. Throughout the book, we emphasize how computer systems researchers 
can contributes to the exciting emerging field. As such, the structure of this book is as follows. 
Chapter 2 reviews the central concepts in quantum computation, compares and contrasts with 
those of classical computation, and discusses the leading technologies for implementing qubits. 
Chapter 3 summarizes the general features in quantum algorithms and reviews some of the 
important NISQ applications. 

‘The second part of the book starts in Chapter 4 with an overview of the quantum ar- 
chitectural vertical stack and the cross-cutting themes that enable synergy among the different 
disciplines in the field. The rest of the book illuminates the opportunities in quantum com- 
puter systems research, broadly split into five tracks: (i) Chapter 5 describes existing quantum 
programming languages and techniques for debugging and verification; (ii) Chapter 6 intro- 
duces important quantum compilation methods including circuit optimization and synthesis; 
(iii) Chapter 7 dives into low-level quantum controls, pulse generation, and calibration; (iv) a 
number of noise mitigation and error correction techniques are reviewed in Chapter 8; (v) Chap- 
ter 9 discusses different methods in classical simulations of quantum circuits and their implica- 
tions; and (vi) a summary of progress and prospects of quantum computer systems research can 
be found in Chapter 10. 

‘The reader is encouraged to start with the Summary and Outlook section in some chap- 
ters for a quick overview of fundamental concepts, highlights of state-of-the-art research, and 
discussions of future directions. 


Yongshan Ding and Frederic T. Chong 
Chicago, June 2020 


Acknowledgments 


Our views in the book are strongly informed by ideas formed from discussions with Yuri Alex- 
eev, Kenneth Brown, Chris Chamberland, Isaac Chuang, Andrew Cross, Bill Fefferman, Diana 
Franklin, Alexey Gorshkov, Hartmut Haeffner, Danielle Harlow, Aram Harrow, Henry Hoff- 
man, Andrew Houck, Ali Javadi-Abhari, Jungsang Kim, Peter J. Love, Margaret Martonosi, 
Akimasa Miyake, Chris Monroe, William Oliver, John Reppy, David Schuster, Peter Shor, 
Martin Suchara, members of the EPiQC Project (Enabling Practical-scale Quantum Com- 
putation, an NSF Expedition in Computing), and members of the STAQ Project (Software- 
Tailored Architecture for Quantum co-design). Thanks are extended to the students who took 
the 2018 course on quantum computer systems for their helpful lecture scribing notes: Anil 
Bilgin, Xiaofeng Dong, Shankar G. Menon, Jean Salac, and Lefan Zhang, among others. 

Thanks to Morgan & Claypool Publishers for making the publication this book possi- 
ble. Many thanks to Michael Morgan, who invited us to write on the subject, for his patience 
and encouragement. Thanks also to our Synthesis Lecture series editors Natalie Enright Jerger 
and Margaret Martonosi, who shepherded this project to its final product. YD and FTC are 
grateful to Frank Mueller and the anonymous reviewers for providing in-depth comments and 
suggestions on the original manuscript. Thanks to Sara Kreiman for her thorough copyedit of 
the book. 

YD has learned a tremendous amount from his advisor FTC, and is very grateful for 
FTC’s mentorship in quantum information science research and education. YD also thanks 
Ryan O'Donnell, who first introduced him to the field of quantum computation and informa- 
tion. YD worked on this book while visiting the Massachusetts Institute of Technology. YD es- 
pecially thanks Isaac Chuang, Aram Harrow, and Peter Shor for the many inspiring discussions 
during his visit. YD thanks all of his colleagues, friends, and relatives for their encouragement 
and support in writing and finishing the book, especially Meizi Liu, and YD’s parents, Genlin 
Ding and Shuowen Feng. 

Finally, YD and FTC gratefully acknowledge the support from the National Science 
Foundation, specifically by EPiQC, an NSF Expedition in Computing, under grants CCF- 
1730449, in part by STAQ, under grant NSF Phy-1818914, and in part by DOE grants DE- 
SC0020289 and DE-SC0020331. 


Yongshan Ding and Frederic T. Chong 
Chicago, June 2020 


List of Notations 


The nomenclature and notations used in this book may be unfamiliar to many readers and may 
have different meanings in a different context. We devote this section to clarifying some of the 
conventions this book uses to prevent confusion. 


Systems Terminology 


Adiabatic quantum computing is a model of analog quantum computing where a quan- 
tum system remains in the ground state energy. 


Analog quantum computing (AQC) is a model of quantum computation such that the 
state of a quantum system is evolved smoothly. 


Boolean circuit is a model of classical computation that expresses computation by send- 
ing data through a combination of logic gates. 


FT refers to being fault tolerant; a fault-tolerant quantum computer relies on quantum 
error correction. 


The gaze scheduling problem is to design an ordering or synchronization of quantum 
gates to be applied to the qubits in the target architecture, under constraints such as 
data dependencies, parallelism, communication, and noise. 


Hamiltonian refers to the mathematical representation of the energy configuration of 
a physical system. It is commonly used as a linear algebraic operator in quantum me- 
chanics. 


Host processor is an abstraction that refers to the classical computer that controls the 
processes in quantum computer systems. 


Quantum annealing is a model of analog quantum computing where-in the quantum 
systems interact with the thermal environment. 


Lambda calculus is a model of classical computation based on functional expressions 
using variable binding and substitution. 


Measurement-based quantum computing (MBQC) is a model of computation that per- 
forms computation via only measurements on qubits previously initialized to a cluster 
state. 


xx LIST OF NOTATIONS 


A NISQ computer refers to a noisy intermediate-scale quantum computer. 


Turing machine is a model of classical computation for abstract computing machines 
based on manipulating data sequentially on a strip of tape following a set of rules. 


Quantum compiling refers to the framework for efficiently implementing a given quan- 
tum program or target unitary to high precision, using gates from a set of primitive 
instructions supported in the underlying quantum architecture. 


Quantum communication is a branch of quantum technology where-in entangled qubits 
are used to encrypt and transmit data. 


Quantum circuit synthesis refers to the technique that constructs a gate out of a series of 
primitive operation. 


Quantum device topology (or device connectivity) describes the layout of the physical 
qubits and the allowed direct interactions between any pair of qubits. 


Quantum logic gates (or qubit operations or quantum instructions) are transformations to 
be applied to qubits, represented by unitary matrices. 


‘The qubit mapping problem aims to find an optimal mapping from the qubit registers 
in a quantum program to the qubits in the target architecture, under constraints such 
as system size, data dependencies, communication, ancilla reuse, and noise. 


Quantum processing unit (QPU) refers to a hardware component that implements qubits 
as well as the control apparatus. 


A quantum program is an abstraction that refers to the sequence of instructions and 
control flow that a quantum computer must follow according to a protocol or an algo- 
rithm. 


Quantum sensing is a branch of quantum technology that takes advantage of quantum 
coherence to perform measurements of physical quantities. 


Quantum simulation is a branch of quantum technology that studies the structures and 
properties of electronic or molecular systems. 


Schoelkopf’s law is an empirical scaling projection for quantum decoherence—delayed 
by a factor of 10 roughly every three years. 


The von Neumann architecture is a stored-program computer architecture that controls 
instruction fetch and data operations via a common system computer bus. 


LIST OF NOTATIONS xxi 
Linear Algebra and Probability in Quantum Computing 


* The dasis of a qubit is a set of linearly independent vectors that span the Hilbert space. 
The two most common bases for single qubits are the computational basis (z basis): 


a = (5) L 


and the Fourier basis (x basis): 


(HY. x eo Gat 


* The Bloch sphere is a visualization of single-qubit Hilbert space H in three-dimensional 
Euclidean space R?: 


1 
p(x, y, z) = SU + xax + yoy + 202). 


* The dra vector is the conjugate transpose of a ket vector: 
(vl =(a* p*). 
* A cluster state is a quantum state defined by a graph, where the nodes in the graph are 
qubits initialized to |+) state, and the edges are controlled-Z gates between the qubits. 


* A complex number z € C isa number in the form of a + bi, where a, b are real numbers 
and i is an imaginary unit satisfying i? = —1. a is called the read part, and b is called 
the imaginary part of z. The conjugate of z is z* = a — bi. 


* The conjugate transpose of a matrix M is denoted as M! whose matrix elements are: 


[M]; = [M]ji*. 


e An EPR pair refers to two qubits in the quantum state |epr) = (|00) + |11))//2. 


* The common mapping is |0) for ground energy state, and |1) for first excited energy state. 
In the context of the physical implementation of a qubit, the computational basis cor- 
responds to the discrete energy levels. 


* A complex square matrix is Hermitian if its complex conjugate transpose H ! is equal 
to itself: 
Hİ =H. 


LIST OF NOTATIONS 


* The Hilbert space H is complex inner product space in which a n-qubit quantum state 
is a 2”-dimensional vector of complex entries. 


* The inner product of two quantum states |Y) = X>; a; |j). lØ) = Xp Be lk) is (Wid) = 
2; = aj Bi. 


* An identity matrix I is a matrix with 1 along the diagonal and 0 everywhere else. 
* For any real number p > 1, the £p norm of a vector x = (X1,..., Xn) is defined as 
n 1/p 
lIxilp = (x ar) ; 
i=1 


* eM and exp(M) are notations for matrix exponential, which is defined as: 


* A mixed quantum state or density matrix is a probability ensemble of pure quantum 


states: p = Yi pi |Wi) (Wil. 


* The Pauli matrices are 


5 = \¢- VL 9 
“uoy? u of 7 W0 er 
* A probability distribution refers to a finite set of non-negative real numbers p; that sums 
to 1: p; > Oand >, p; = 1. 
* A quantum channel is a linear mapping from one mixed state to another mixed state 


p > E(p). 


* Quantum states are represented by (column) vectors in the Hilbert space using Dirac’s 
ket vector notation: 


* sgn(x) is the sign of the number x. 


* The tensor product of two quantum states |y) = » 5; |j) .|9) = Dox Bx Ik) is |V) & 
I) = 3j. = BC J) & Ik). 


LIST OF NOTATIONS xxii 
e The ¢race of a matrix A is the sum of its diagonal elements, tr(A) = ^; Ai; = 


>; lei| Alei), where |e;) is the basis vector with 1 at the i'^ index and 0 everywhere 
else. 


A complex square matrix U is unitary if its complex conjugate transpose U' is also its 
inverse: 


UU = UUŻ =]. 


* The system, or wave function, of a qubit can be written as a linear combination of basis 
states. 


PART I 


Building Blocks 


CHAPTER 1 


Introduction 


Just 40 years ago, the connection between computer science and quantum mechanics was made. 
For the first time, scientists thought to build a device to realize information processing and com- 
putation using the extraordinary theory that governs the particles and nuclei that constitute our 
universe. Since then, we find ourselves time and again amazed by the potential computing power 
offered by quantum mechanics as we understand more and more about it. Some problems that 
are previously thought to be intractable now have efficient solutions with a quantum computer. 
This potential advantage stems from the unconventional approach that a quantum computer 
uses to store and process information. Unlike traditional digital computers that represent two 
states of a bit with the on and off states of a transistor switch, a quantum computer exploits its 
internal states through special quantum mechanical properties such as superposition and entan- 
glement. For example, a quantum bit (qubit) lives in a combination of the 0 and 1 states at the 
same time. Astonishingly, these peculiar properties offer new perspectives to solving difficult 
computational tasks. This chapter is dedicated to a high-level overview of the rise of quantum 
computing and its disruptive impacts. More importantly, we highlight the computer scientists’ 
role in the endeavor to take quantum computing to practice sooner. 


1.1 THE BIRTH OF QUANTUM COMPUTING 


Paul Benioff began research on the theoretical possibility of building a quantum computer in the 
1970s, resulting in his 1980 paper on quantum Turing machines [2]. His work was influenced 
by the work of Charles Bennett on classical reversible Turing machines from 1973 [3]. 

In 1982, the Nobel-winning physicist Richard Feynman famously imagined building a 
quantum computer to tackle problems in quantum mechanics [4, 5]. The theory of quantum 
mechanics aims to simulate material and chemical processes by predicting the behavior of the 
elementary particles involved, such as the electrons and the nuclei. These simulations quickly 
become unfeasible on traditional digital computers, which simply could not model the staggering 
number of all possible arrangements of electrons in even a very small molecule. Feynman then 
turned the problem around and proposed a simple but bold idea: why don't we store information 
on individual particles that already follow the very rules of quantum mechanics that we try to 
simulate? He remarked: 


“Tf you want to make a simulation of nature, youd better make it quantum mechanical, 
and by golly it’s a wonderful problem, because it doesn't look so easy.” 
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The idea of quantum computation was made rigorous by pioneers including David 
Deutsch [6, 7] and David Albert [8]. Since then, the development of quantum computing has 
profoundly altered how physicists and chemists think about and use quantum mechanics. For in- 
stance, by inventing new ways of encoding a quantum many-body system as qubits on a quantum 
computer, we gain insights on the best quantum model for describing the electronic structure 
of the system. It gives rise to interdisciplinary fields like quantum computational chemistry. As 
recent experimental breakthroughs and theoretical milestones in quantum simulation are made, 
we can no longer talk about how to study a quantum system without bringing quantum com- 
putation to the table. 


1.1.1 THE RISE OF A NEW COMPUTING PARADIGM 


For computer scientists, the change that quantum computing brings has also been nothing short 
of astounding. It is so far the only new model of computing that is not bounded by the extended 
Church-Turing thesis [9, 10], which states that all computers can only be polynomially faster 
than a probabilistic Turing machine. Strikingly, a quantum computer can solve certain compu- 
tational tasks drastically more efficiently than anything ever imagined in classical computational 
complexity theory. 

It is not until the mid-1990s that the power of quantum computing was becoming fully 
appreciated. In 1993, Bernstein and Vazirani [9] demonstrated a quantum algorithm with expo- 
nential speedup over any classical algorithms, deterministic or randomized, for a computational 
problem named recursive Fourier sampling. Many more astonishing discoveries followed. In 
1994, Dan Simon [10] showed another computational problem that a quantum computer has 
an exponential advantage over any classical computers. 

Then in the same year, Peter Shor [11, 12] discovered that more problems, namely fac- 
toring large integers and solving discrete logarithms, also have efficient solutions on a quantum 
computer, far more so than any classical algorithms that are ever known. The implication of this 
discovery is breathtaking. Existing cryptographic codes encrypt today's private network commu- 
nications, data storage, and financial transactions, relying on the fact that prime factorization 
for sufficiently large integers is so difficult that the most powerful digital supercomputers could 
take thousands or millions of years to compute. But the security of our private information could 
be under threat, should a quantum computer capable of running Shor's algorithm be built. 

In 1996, another algorithm by Lov Grover was discovered [13]. Once again, a quantum 
algorithm is shown to provide improvement over classical algorithms, and in this case Grover's 
algorithm exhibits quadratic speedup for the problem of unstructured database search in which we 
are given a database and aim to find some marked items. For example, given an unordered set 
S of N elements, we want to find where x € S is located in the set. Classically, we need O(N) 
accesses to the database in the worst case, while quantumly, we can do it with O( JN) accesses. 
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These are just a few examples of quantum algorithms that have been discovered. When 
implemented appropriately on a quantum computer, they offer efficient solutions to problems 
that seem to be intractable in the classical computing paradigm. 

Building a quantum computer is, however, extremely challenging. When the idea was first 
proposed, no one knew how to build such powerful computers. To realize the computational 
power, we must learn to coherently control and manipulate highly-entangled, complex, physical 
systems to near perfection. 

In the last 30 years or so, technologies for manufacturing quantum chips have significantly 
advanced. Today, we are at an exciting time where small- and intermediate-scale prototypes have 
been built. It marks a new era for quantum computing, as John Preskill, a long-time leader in 
quantum computing at Caltech, puts it, “we have entered the Noisy Intermediate-Scale Quantum 
(NISQ) era,” [14] in which quantum computing hardware is becoming large and reliable enough 
to perform small useful computational tasks. Research labs from both academia and industry, 
domestic and abroad, are now eager to experimentally demonstrate a first application of quantum 
computers to some real-world problems that any classical computers would have a hard time 
solving efficiently. 


1.1.2 WHAT IS A QUANTUM COMPUTER? 


In a nutshell, a quantum computer is a computing device that stores information in objects called 
quantum bits (or qubits) and transforms them by exploiting certain very special properties from 
quantum mechanics. Despite the peculiarity in the behavior of quantum mechanical systems 
(e.g., particles at very small energy and distance scales), quantum mechanics is one of the most 
celebrated and well-tested theory for explaining those behaviors. Remarkably, the non-intuitive 
properties and transformations in quantum systems have significant computational significance, 
as they allow a quantum computer to operate on an exponentially large computational space. 

In contrast, a traditional digital computer stores information in a sequence of bits, each of 
which takes two possible values, 0 or 1, represented by the on and off of a transistor switch, for 
example. To manipulate information, it sends an input sequence of bits in the form of electrical 
signals through integrated circuits (IC) to produce another sequence of bits. This process is 
deterministic and fast, thanks to the advanced technologies in IC fabrication. Computers today 
can afford billions of instructions per second, without worrying about experiencing an error for 
billions of device hours. 


1.2 MODELS OF QUANTUM COMPUTATION 


The approaches to quantum computing (QC) can be roughly split into three main categories: 
(i) analog QC, (ii) digital gate-based QC, and (iii) measurement-based QC. 
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1.2.1 ANALOG MODEL 


In analog QC [15, 16], one gradually evolves the state of a quantum system using quantum 
operations that smoothly change the system such that the information encoded in the final 
system corresponds to the desired answer with high probability. When the quantum system 
is restricted to evolve slowly and remains in a ground state energy throughout the evolution, 
then this approach is typically referred to as “adiabatic quantum computing” [17, 18]. When 
the restriction is lifted and the system is allowed to interact with the thermal environment, it is 
referred to as “quantum annealing” [19, 20]. This analog approach is sought after by companies 
including D-Wave systems, Google, and others. However, whether or not existing quantum 
annealing devices achieve universal quantum computation or any quantum speedup remains 
unclear. 


1.2.2 | GATE-BASED MODEL 


In digital OC, information is encoded onto a discrete and finite set of quantum bits (qubits), 
and quantum operations are broken down to a sequence of a few basic quantum logic gates. 
We obtain the correct answer with high probability from the digital measurement outcomes 
of the qubits. A digital OC is typically more sensitive to noise from the environment than an 
analog QC. For instance, qubit decoherence is usually considered undesirable in digital QC ex- 
cept sometimes during initialization and measurement, whereas in adiabatic QC, decoherence 
helps the system relax to the ground energy state [17, 21]. In the NISQ era, noise including 
qubit decoherence, imprecise control, and manufacturing defect has non-negligible detrimental 
effects and can accumulate when running long quantum algorithms, necessitating noise mitiga- 
tion techniques to protect the information during the computation. These devices are called the 
“NISQ digital quantum computers." In principle, the discretization of information allows for 
the discretization of errors and use of redundancy to encode information, which give rise to the 
use of quantum error correction (OEC) to achieve system-level fault tolerance. However, the 
overheads of conventional QEC approaches are found to be inhibitory in the near term. Devices 
that implement QEC are called “fault-tolerant quantum computers." Throughout the remainder 
of the book, the discussion will be centered around NISQ digital OC. Nonetheless, the general 
principles and techniques introduced here are applicable to all types of quantum computers. We 
refer the interested readers to a number of pertinent textbooks, reviews, and theses [22-29]. 


1.3.3  MEASUREMENT-BASED MODEL 


One example of measurement-based quantum computation (MBQC) is the cluster state 
model—see a short review in [30]. In this model of quantum computation, one initializes a 
number of qubits in the cluster state. The cluster state is represented by a graph, in which each 
node is a qubit initializes in |+) and each edge denotes a controlled-Z gate. The graph can have 
any topology, e.g., a 1-D chain, or a 2-D grid. The computation process involves measuring (in 
some measurement basis) some of the qubits in the cluster state. Some of the measurements are 
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Figure 1.1: A QPU (quantum processor unit) and how it interacts with classical computers. 
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possibly conditioned on the outcomes of previous measured qubits. The key observation here 
is that each measurement equivalently accomplishes a quantum gate due to gate teleportation. 
The output of the computation is the measurement bit-string outcome and the remaining state 
of the qubits that are not measured. It is shown that this is a universal quantum computation 
model. 

Our focus of the book is on the gate-based model; we present the other models of com- 
putation here for completeness, but details of the models are out of the scope of this book. 
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Quantum computing hardware is currently envisioned to be a hardware accelerator for classical 
computers, as shown in Figure 1.1. To some extent, it is like a processing unit specialized in 
dealing with quantum information, in a way similar to a GPU (graphics processing unit) that 
specializes in numerical acceleration of kernels, including creation of images for display. For 
this reason, the QC hardware is referred to as a QPU (quantum processing unit). Unlike a GPU, 
which can perform arithmetic logic and data fetching at the same time, a QPU does not fetch 
data or instructions on its own. A host processor controls every move of the OPU. Let us now 
dive deeper into the architectural design of a QPU. 

It is often misunderstood that a quantum computer is going to replace all classical digital 
computers. A quantum computer should never be viewed as a competitor with a classical com- 
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Computer Systems in 1950s Computer Systems Today Quantum Computer Systems 
Algorithms Algorithms Algorithms 
High-Level Languages Quantum DSL, Compilation, 
Assembly Language, Compiler OS Unitary Synthesis, Pulse 
Circuit Synthesis Shaping, Noise Mitigation, 
Architecture Error Correction 
Devices (Vacuum Tubes) Devices (Transistors) Devices (Qubits) 


Figure 1.2: Architectural designs of classical vs. quantum computers. The abstraction layers for 
1950s classical computing, today’s classical computing, and quantum computing are compared. 


puter. In fact, classical processing and classical control play vital roles in quantum computing. 
On one hand, a quantum algorithm generally involves classical pre- or post-processing. On the 
other hand, efficient classical controls are needed for running the algorithm on hardware. As 
such, a better way of regarding the QC hardware is as a co-processor or an accelerator, that is a 
QPU, as opposed to direct replacements of classical computers. 


1.3.1 ARCHITECTURAL DESIGN OF A QPU 


A quantum computer implements a fundamentally different model of computation than a mod- 
ern classical computer does. It would be surprising if the exact design of a computer architecture 
would extend well for a quantum computer [31, 32]. As shown in Figure 1.2, the architectural 
design of a quantum computer resembles that of a classical computer in the 1950s where de- 
vice constraints are so high that full-stack sharing of information is required from algorithms to 
devices. In time, as technology advances and resource becomes abundant, a quantum computer 
perhaps will adapt to the modularity and layering models as seen in classical architectures. But 
in the short term, as long as the NISQ era lasts, it is premature to copy the abstraction layers of 
today’s conventional computer systems to a quantum system. 

Furthermore, quantum information processing is fundamentally different than what com- 
puter engineers are used to. For instance, for conventional computers, engineers go to great 
lengths in minimizing the noises caused by quantum mechanics in the transistor components. 
Rather than suppressing its effects, a quantum computer harnesses the power of quantum me- 
chanics. As such, the control apparatus for a quantum computer would look drastically different 
from that of a conventional computer. 

In reality, for successful operation, a quantum computer must implement a well-isolated 
physical system that encodes a sufficiently large number of qubits, and controls these qubits with 
extremely high speed and precision in order to carry out computation. The rest of the section 
describes at a high level the key components in a fully functional quantum computer architecture. 
The developments of digital quantum computers, for both the NISQ and FT era, still face 
challenges, which comprise of reliably addressing and controlling qubits and correcting errors. 
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Figure 1.3: Quantum computation is one of the promising technologies made for harnessing 
the power of quantum systems. 


The control complexity becomes overwhelming as the number of qubits scale up, necessitating 
system-level automation to guarantee the successful execution of quantum programs [33]. As 
such, classical computers are needed to control and assist the quantum processor. Quantum 
computers are generally viewed as co-processors or accelerators of classical computers, as shown 
in Figure 1.1. 

To some extent, the quantum computer architecture illustrated above arguably resembles 
in-memory processing or reconfigurable computing architectures. As shown in Figure 1.1, inside 
a QPU, quantum data are implemented by physical (quantum mechanical) objects such as atoms 
while quantum gates are control signals such as lasers acting on the data—this "gates-go-to-data" 
model of computation motivates a control unit close to the quantum data and an interface that 
talks frequently with the quantum memory and the classical memory. 


1.4 QUANTUM TECHNOLOGIES 


The broad field of quantum technology encompasses more than just quantum computation; it 
can be roughly divided into four domains shown in Figure 1.3. (i) In quantum computation, 
quantum systems are carefully isolated and controlled to store and transform information in a 
way that is promised to be drastically more efficient than classical digital computers. (ii) Quantum 
communication [34-39] aims to use entangled photons to encrypt and transmit data securely. 
(iii) To study the structure and properties of electronic systems, quantum simulation [4, 40— 
46] maps the problem to a well-defined, controlled quantum system to mimic the behavior of 
quantum systems of interests. (iv) Quantum sensing [47—49] use quantum coherence to improve 
precision measurements of physical quantities. Each domain has its own focus, yet one can 
usually benefit from the techniques developed for another. Although mainly about quantum 
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computation, this book will extend its discussions from time to time to the other domains of 
quantum technologies. 


15 A ROAD MAP FOR QUANTUM COMPUTERS 


Today’s quantum computers resemble, in many aspects, the digital computers we had in the 
1950s. They are large in physical size, limited in the number of computing units, expensive to 
build, and demanding in power. The machines we build in the NISQ era will be equipped with 
50-1000 qubits and are capable of performing operations with error rate around 107° or 1074. 
The state-of-the-art quantum gate error rate is around 1077. As a result, when programming 
for a quantum computer, we have no choice but to optimize every bit of the limited resources. 
When qubits are not only short-lived but also limited in number and when quantum logic gates 
are noisy, every variable and every instruction in a quantum program matter. 

In fact, we have been here before. After all, this is where we started for classical digital 
computers. The introduction of integrated circuits (IC) in the 1960s paved the ground work 
for the impressive performance growth of contemporary digital computers. In 1964, Gordon 
Moore accurately projected the exponential growth in the number of transistors per integrated 
circuit based on the cost of IC fabrication, now known as the Moore's Law. After half a century 
of investment and development in hardware, architecture, and software, we have built a com- 
puting ecosystem that has deeply changed our society and transformed the way we live, work 
and communicate [50]. 

Many believe that similar scaling will be achieved for quantum computers. For instance, 
the reported qubit coherence times (i.e., lifetime) for superconducting qubits have been on track 
for an encouraging exponential increase so far, following the so-called Schoelkopf s Law. But 
whether this scaling will last relies on continued investment in the field of quantum computing, 
driven by not only our scientific curiosity but also its economic and social impacts. 

Fueled by joint efforts from research institutions and technology companies worldwide, 
progress in quantum hardware has been impressive. IBM [51, 52] and Google [53] are test- 
ing superconducting machines with more than 50 quantum bits (qubits) and providing users 
with cloud access to their prototypes. Intel [54, 55] is building quantum computers with silicon 
spin qubits and cryogenic controls. IonQ [56] has announced a 79-qubit “tape-like” trapped-ion 
quantum computer. Other multinational companies, including Intel, Microsoft, and Toshiba, 
are also making efforts toward practical-scale fully programmable quantum computers. Many 
others, although not building prototypes by themselves, are joining the force by investing in the 
field of quantum computing. Machines up to 100 qubits are around the corner, and even 1,000 
qubits appears buildable. John Preskill notes that we are at a "privileged time in the history of 
science and technology" [14]. Specifically, classical supercomputers cannot simulate quantum 
machines larger than 50-100 quantum bits. Emerging physical machines will bring us into un- 
explored territory and will allow us to learn how real computations scale in practice. 
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Figure 1.4: Status of qubit technologies [57—67]. Also drawn the gap between algorithms and 
realistic machines. Breaking abstractions via software-hardware co-design will be key in closing 
this gap for NISQ computers, hence the overarching theme of this book. 


The key to quantum computation is that every additional qubit doubles the computational 
space in which the quantum machines operate. However, this extraordinary computing power 
is far from fully realized with today's technology, as the quantum machines will have high error 
rates for some time to come. Ideally in the long term, we would use the remarkable theory of 
quantum error correction codes to support error-free quantum computations. The idea of error 
correction is to use redundancy encoding, i.e., grouping many physical qubits to represent a sin- 
gle, fault-tolerant qubit. As a consequence, a 100 qubit machine can only support, for example, 
3—5 usable logical qubits. Until qubit resources become much larger, another practical approach 
would be to explore error-tolerant algorithms and use lightweight error-mitigation techniques 
in the near term. So NISQ machines imply living with errors and exploring the effects of noise 
on the performance and correctness of quantum algorithms. 


1.5.1 COMPUTER SCIENCE RESEARCH OPPORTUNITIES 


Despite technology advances, there remains a wide gap between the machines we expect and 
the algorithms necessary to make full use of their power. In Figure 1.4, we can see the size of 
physical machines (in this case trapped ion machines) over time. Ground-breaking theoretical 
work produced Shor's algorithm [12] for the factorization of the product of two primes and 
Grover's algorithm [13] for quantum search, but both would require machines many orders of 
magnitude larger than currently practical. This gap has led to a recent focus on smaller-scale, 
heuristic quantum algorithms in areas such as quantum simulation, quantum chemistry, and 
quantum approximate optimization algorithms (QAOA) [68]. Even these smaller-scale algo- 
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rithms, however, suffer from a gap of two to three orders of magnitude with respect to recent 
machines. Relying on solely technology improvements may require 10—20 years to close even 
this smaller gap. 

A promising way to close this gap sooner is to create a bridge from algorithms to physical 
machines with a software-architecture stack that can increase qubit and gate efficiency through 
automated optimizations and co-design [31]. For example, recent work [69] on quantum circuit 
compilation tools has shown that automated optimization produces significantly more efficient 
results than hand-optimized circuits, even when only a handful of qubits are involved. The ad- 
vantages of automated tools will be even greater as the scale and complexity of quantum pro- 
grams grows. Other important targets for optimization are mapping and scheduling compu- 
tations to physical qubit topologies and constraints, specializing reliability and error mitigation 
for each quantum application, and exploiting machine-specific functionality such as multi-qubit 
operators. 

Quantum computing technologies have recently advanced to a point where quantum de- 
vices are large and reliable enough to execute some applications such as quantum simulation 
of small-size molecules. ‘This is an exciting new era because being able to program and control 
small prototypes of quantum computers could lead to discoveries of more efficient algorithms 
to real-world problems than anything ever imagined in the classical computing paradigm. Pub- 
lic interests, including commercial and military interests, are essential in keeping substantial 
support for basic research. Recent discoveries of quantum applications in chemistry, finance, 
machine learning, and optimization are just some early evidence of a promising future ahead. 
Looking forward, we are on the track to continuously grow the performance of the quantum 
hardware, complemented with an efficient, scalable, and robust software toolflow that it de- 
mands [32, 70, 71]. 
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CHAPTER 2 


Think Quantumly About 


Computing 


We begin this chapter with a presentation of the intuitions behind quantum information and 
g P P q 

quantum computation (Section 2.1). They then are made rigorous with mathematical formula- 

tions in Section 2.2. 


2.1 BITS VS. QUBITS 


In this section, the elements of classical computing are compared and contrasted with those of 
quantum computing (Section 2.1.1). The introduction to quantum mechanics in this section 
is tailored for computer scientists, assuming no prior knowledge in physics (Section 2.1.2). A 
number of architectural implications is then introduced (Section 2.1.3), arising from the special 
properties and transformations in this new computing paradigm. 


2.1.1 COMPUTING WITH BITS: BOOLEAN CIRCUITS 


Part of the learning curve of quantum computing (OC) stems from its unfamiliar nomenclature. 
Some is required for expressing the special properties of quantum mechanics, but the rest is 
merely a reformulation of what we already know about what an ordinary computer can do. As 
such, to prepare the reader for later discussion in QC, we briefly revisit how classical digital 
computers work, but in the language and notation used by QC. In particular, we will review 
four fundamental concepts in the classical theory of computing: the circuit model, von Neumann 
architecture, reversible computation, and randomized computation. 


Boolean Circuits 

A number of classical models of computation are developed to describe the components of a 
computer necessary to compute a mathematical function. Some familiar ones include the Turing 
machine model (sequential description), the /ambda calculus model (functional description), etc. In 
this section, we choose to review the Boolean circuit model of computation, which is considered 
the easiest to extend to the theory of quantum computing. 'Ihese models, although expressing 
computability and complexity from different perspectives, are in fact equivalent. Specifically, 
every function computable by an n—input Boolean circuit is also computable by a Turing ma- 
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Figure 2.1: A Boolean circuit implementing the XOR function using a NAND gate, an OR gate, 
and an AND gate. Lines are wires that transmit signals, and shaped boxes are gates. Signals are 














copied/duplicated where wires split into two. 


chine of length-n inputs, and vice versa. The size of a circuit, defined by the number of gates it 
uses, is closely related to the running time of a Turing machine. 

In a classical digital computer (under the Boolean circuit model), information is stored 
and manipulated in bits—strings of zeros and ones, such as 10011101. The two states of each 
bit in the string are represented in the computer by a two-level system, such as charge (1) or no 
charge (0) in a memory cell (for storing) and high (1) or low (0) voltage signal in a circuit wire 
(for transmitting). 

In the "bra-ket" notation invented by Paul Dirac in 1939, the state ofa bitis denoted by the 
symbol, |). So, the two-level system can be written as |0) and |1), or |f} and ||), or |charge) and 
[no charge), etc. The above length-8 bit string can thus be written as |1) |0) |0) |1) |1) |1) |0) |), 
or |10011101) for short. Why is this called the “bra-ket” notation? In fact, |-) is called the “ket” 
symbol and (-| is called the "bra" symbol, and together they form a bracket (-|-). Later, we will 
see in the linear algebra representation of quantum bits, they correspond to column vectors and 
row vectors, respectively. For now, the reader may regard this notation as pure symbolism—its 
advantages will be clear once we discuss operations of quantum bits. 

Any computation can be realized as a circuit of Boolean logic gates. For example, the follow- 
ing is a circuit diagram for computing the XOR function of two input bits: f (x1, x2) = x1 Ð x2 
(Figure 2.1). 

In this classical Boolean circuit, lines are ^wires" that transmit signals, and boxes are "gates" 
that transform the signals. Signals are copied/duplicated at places where wires split into two. The 
above shows one possible implementation of the XOR function with AND, OR, and NAND 
gates. It is well known that the NAND gate, along with duplication of wires and use of ancilla 
bits (i.e., ancillary input bits typically initialized to 0), is universal for computation. In other 
words, any Boolean function is computable by “wiring together" a number of NAND gates. 

The Boolean circuit model is a useful theoretical tool for analyzing what functions can be 
efficiently implemented. It is also a convenient tool for computer architects and electrical engi- 
neers as it is close to the physical realization of today’s computers. The von Neumann architecture 
is one example of a design of modern computers. 
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Figure 2.2: The von Neumann Architecture of a classical computer. 
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von Neumann Architecture 

In the following we describe the key components comprising modern digital computers, first 
proposed by John von Neumann in 1945. In his description, a von Neumann architecture com- 
puter has these components, as shown in Figure 2.2: (i) a central processing unit (CPU) includ- 
ing an arithmetic logic unit (ALU) and a control unit; (ii) a random-access memory (RAM) 
that stores data and program instructions; (iii) input and output (I/O) devices; and (iv) external 
storage. 

‘The instruction set architecture (ISA), serving as the interface between hardware and soft- 
ware, defines what a computer natively supports, including data types, registers, memory mod- 
els, I/O support, etc. Modern ISAs are commonly classified into two categories: (i) complex 
instruction set computer (CISC) that supports many specialized operations regardless of how 
rarely they are used in a program. One example of CISC is the Intel x86-family architecture; 
and (ii) reduced instruction set computer (RISC) that includes only a small number of essential 
operations, such as the RISC-V architecture [72]. 

The CPU realizes (implements) the ISA. While its design can be very complex, the CPU 
typically has a control unit that fetches and executes instructions by directing signals accordingly, 
and an ALU that performs arithmetic and logic operations on data. Most modern CPUs are 
implemented in electric circuitry, as seen in the Boolean circuit model, printed on a flat piece of 
semiconductor material, known as an integrated circuit (IC). 

Over the past few decades, production cost for IC has been drastically reduced thanks to 
advancement in technology [50]. We can build transistors, the building blocks of an IC, smaller 
and smaller, cheaper and cheaper. The number of transistors that can be economically printed 
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per IC has been growing exponentially over time—approximately doubling every 1.5 years. This 
trend is referred to as the “Moore’s Law.” But as most believed, this trend is not a sustainable 
one, due to both physical limitations and market size. It is expected that within five years the 
feature size of transistors will stop at a few nanometers. As it approaches the atomic level (also 
on the order of nanometer), noises from quantum mechanical processes will start to dominate 
and perturb the system. 


Reversible Computation 

The study of reversible computing originally arises from the motivation to improve the com- 
putational energy efficiency in the 1960s and 1970s led by Laundauer [73] and Bennet [74]. 
Quantum computers transform quantum bits reversibly (except for initialization and measure- 
ment). The connection between reversible computation and quantum mechanics was discovered 
by Benioff in the 1980s [2]. As a result, QC benefits a great deal from the study of reversible 
computing, and vice versa. Later, we will see the roles of reversible computing in quantum cir- 
cuits. 

According to the second law of thermodynamics, an irreversible bit operation, such as the 
OR gate, must dissipate energy, typically in the form of heat. Specifically, suppose the output 
of an OR gate is |1). We cannot infer what the inputs were—they could be anything from |01), 
|10), or |11). The von Neumann-Landauer limit states that kT In(2) energy is dissipated per ir- 
reversible bit operation. However, some bit operations are theoretically (logically) "reversible"— 
in the sense of that the output state uniquely determines the input state of the operation. For 
example, the NOT gate is reversible. Flipping the state of a bit from |0) to |1), or vice versa, 
does not create or erase information from the system. To some extent, reversible also means 
time-reversible—the transformation done by a reversible circuit can be reverted by applying the 
inverse transformation (which always exists). 

One could imagine a computer can be built consisting solely of reversible operations. In 
analogy to the NAND gate being universal for Boolean logic, is there a reversible gate set that 
is universal? The answer is yes. To illustrate this, we introduce three example reversible gates: 
the NOT gate, the CNOT (controlled-not) gate, and the Toffoli (controlled-controlled-not) 
gate, all of which are self-inverse (i.e., applying the gate twice returns the bits to their original 
state). Their Boolean circuit notations and truth-tables can be found in Table 2.1. Specifically, 
the NOT gate negates the state of the input bits. The CNOT gate is a conditional gate—the 
state of the target bit x2 is flipped if the control bit x, is |1). It is the reversible version of the 
XOR gate. The Toffoli gate has two control bits, xı and x2, and one target bit x3. Similarly, the 
target bit is flipped if both the control bits are |1). It is particularly handy as it can be used to 
simulate the NAND gate and the DUPE gate (with the use of ancillas), and thus is a universal 
reversible gate. 

More formally, we note that the Toffoli gate is universal, in that any (possibly non- 
reversible) Boolean logic can be simulated with a circuit consisting solely of Toffoli gates, given 
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Table 2.1: Reversible logic gates. The truth table of reversible logic gates shows the permutation 
of bits. Toffoli is universal reversible computation. 


00) ++ |00) 
|01) + |01) 
|10) — |11) 
|11) ++ |10) 











that ancilla inputs and garbage outputs are allowed. Proof of this theorem is omitted here. As 
such, a generic reversible circuit has the form shown in Figure 2.3. 

In this circuit, a Boolean function f : (0, 1j" +> (0, 1j" is computed reversibly using only 
Toffoli gates. All ancilla inputs are initialized to |1) (if needed, |0) ancilla can be produced as 
well, because a Toffoli gate on |111) gives |110)). All garbage bits will be discarded at the end 
of the circuit. 

One cannot overemphasize the above theorem's implication to quantum computing—as 
noted before, a quantum computer transforms quantum bits reversibly, so this theorem implies 
that any Boolean circuit can be transformed into a reversible one, and then a quantum one by 
implementing a quantum Toffoli gate and replacing each bit with a quantum bit. Reversible 
circuit synthesis is a useful tool in designing quantum circuits. 
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Figure 2.3: A generic reversible circuit for implementing a possibly irreversible function f : 
(0, 1" — (0, 1%”. 


Randomized Computation 
So far, we have not discussed one familiar ingredient to the computation that appears commonly 
in classical computing—randomness. Many natural processes exhibit unpredictable behavior, 
and we should be able to take advantage of this unpredictability in computation and algorithms. 
The notion of probabilistic computation seems realistic and necessary. On one hand, the physi- 
cal world contains randomness, as commonly seen in quantum mechanics. On the other hand, 
we can propose several computational problems that we do not yet know how to solve efficiently 
without randomness. If BPP=P, however (i.e., the complexity class bounded-error probabilistic 
polynomial time is equivalent to the class deterministic polynomial time), as some believe, then 
randomness is unnecessary and we can simulate randomized algorithms as efficiently with de- 
terministic ones. Nonetheless, randomness is still an essential tool in modeling and analyzing 
the physical world. We can find many examples where randomness is useful: in economics, it is 
well known that Nash equilibrium always exists if players can have probabilistic strategies, and 
in cryptography, a secret key typically relies on the uncertainty in itself. 

Randomness as a resource is typically used in computation in the following two forms: 
(i) an algorithm can take random inputs; and (ii) an algorithm is allowed to make random 
choices. As such, we introduce the notion of random bits and coin flips, again in the “bra-ket” 
and circuit notations. 

Suppose x, is a random bit, and the state of x, is |0) with probability 3 and |1) with 
probability 2, denoted as: 


1 1 
= -|0 + = |1). 
bx) = 510) + 511) 


For now, this notation may look strange and cumbersome, but the benefit of writing the 
state of a bit this way will become clear when we generalize to the quantum setting. The state is 
called a probability distribution of |0) and |1). To describe a general n-bit probabilistic system, 
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we write down the underlying state of the system as: 


$5 mlb), 


be(0,1)n 


where b is any possible length-n bit-string, and py is called the probability of b. By basic prin- 
ciples of probability, all pj values must be non-negative and summing to 1. 

In reality, the physical system is in one of those possible state. When we execute a ran- 
domized algorithm, we expect to observe (sample) the outcome at the end. From the observer's 
perspective, the values of the random bits are uncertain (hidden) until they are observed. Once 
some ofthe random bits in the system are observed, then the state of the system (to the observer's 
knowledge) is changed to reflect what was just learned, following laws of conditional probability. 
For example, a random system can be described with: 


1 1 5 
[x1x2) = z 00 F 41100 F go TII. 


Now suppose it is observed that the first bit is |0) (the probability of this scenario is 
Pr[x; = 0] = H + i = 2). ‘The state of the system affer the observation is then conditioned 
on our observation: 


|x1X2) (given xı = 0) = a 00) + “a lo1) = ; 00) + ; o1). 

Here the bit-strings inconsistent with the outcome are eliminated, and the remaining ones 
are renormalized. 

In a randomized algorithm, we typically allow that (i) it is correct with high probability, 
or (ii) it does not always run in desired time. Some of the uncertainty comes from its ability 
to make decisions based on the outcome of a coin flip. Now suppose we have implemented a 
conditional-coin-flip gate, named CCOIN: + CCOIN —. When the input bit is |0), CCOIN 
does nothing. When the input is |l), CCOIN tosses a fair coin: 














[0)  J0) , 


CCOIN = M 
I1) e 510) + 510. 


Suppose we have a random program that reads: (1) Initialize a bit to xy — |1). (2) Flipa 
fair coin if x; is |l) and write result to x4. (3) Repeat step 2. In terms of a circuit, the program 
looks like: 








Xi: |1)-4 CCOIN H CCOIN — 


One is interested in observing the outcome at the end of the program. Let's analyze the 
circuit step by step. After the first CCOIN gate, |x1) is set to |0) and |1) with equal probability 
(Le, |x1) = 110) + 1 |1)). After the second CCOIN gate, the state becomes |x1) = 4 |0) + 
iG [0) + 1 |))2 - [0) + n |1). It is convenient to write the above process in a state transition 
diagram: 
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The system is initialized in |11). After the first CCOIN gate, the system is put into a ran- 
dom state: 1 |01) + 1 |11). The CNOT gate then transforms the system to 1 [01) + 1 |10), cor- 
relating the two bits. And finally after the second CCOIN gate: |x1x2) — i( 1 [00) + 1 [01)) + 
1|10) = 4 |00) + £01) + 2 |10). Again with a state transition diagram: 
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2.1.2 COMPUTING WITH QUBITS: QUANTUM CIRCUITS 


Finally, we present to the reader, as efficiently as possible, the fundamental concepts in the quan- 
tum mechanics model of computation. Many believe that quantum computing can be described 
simply as randomized computing with a twist where we allow the “probability” to take negative 
(possibly complex) values. Alternatively, it can also be described as reversible computing with 
an additional “Hadamard” gate. Therefore, the goal of this section is to argue the meanings and 
implications of these statements. 
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Quantum Circuit Model 
As usual, we start by describing the state of the quantum system y using the “bra-ket” notation 
introduced earlier. Suppose y is an n-qubit (quantum bit) system: 


y= Y. agb). 
be{0,1}” 

where the coefficient a is called the amplitude (as opposed to probability) of the basis bit-string 
b. Just like probabilities, the amplitudes have two constraints: (i) they can take any complex 
numbers; and (ii) their sum of squared values is 1: 9 ^. (0,1) [a5 |? = 1. In the context of qubits, 
the probability distribution across bit-strings is called the superposition of all bit-strings; the cor- 
relation between bits is called entanglement of qubits. It is important to note that these are not 
renamings of the same concepts!—as random bits and quantum bits are fundamentally differ- 
ent objects. Despite the striking parallelism between the two, we should always be wary of the 
subtleties that differentiate them when analyzing a random circuit vs. a quantum circuit. 

To measure (observe) the outcome of a qubit, we follow almost exactly what we did with 
a random bit. For an n-qubit system, if we measure all qubits at the end of a circuit,” then from 
Iv) = 25e(o,y Œb |b), we observe the bit-string |b) with probability |a,|?. Upon measure- 
ment, the state of the system “collapses” to the single classical definite value: Meas(|y)) = |b), 
and can no longer revert to the superposition as it was before. For example, the superposition 

1 


state |V) = NA [0) + a |1) yields, upon measurement, either outcome with equal probability: 


Pr[Meas(|y)) = |0)] = Pr[Meas(|y)) = |1)] = 3- 
One operation that is of fundamental importance to quantum computation is called the 
Hadamard transformation, a single-qubit quantum gate denoted as — H — in the circuit model: 

















g JO) 5 TID. 
I) > 2510) - Zz ID. 


It turns out that allowing Hadamard gates in a reversible circuit (consisting of Toffoli gates) 
extends the circuit model over to any functions allowed to be computed on qubits (up to global 
phase). For this reason, H gate together with Toffoli gate are universal for quantum computation. 
Note that it does not mean that Nature allows only Hadamard and Toffoli transformations 
on qubits—as we will see in later sections, the laws of quantum mechanics allow a class of 
transformations, called unitary transformations. 

One would argue that any interesting quantum mechanics phenomenon can be explained 
by interference. Unlike probability values that are always non-negative, amplitudes (as being pos- 
sibly negative) can either accumulate and cancel. When two amplitudes accumulate, we say they 

T Many believed that quantum mechanics has deterministic explanations, notably by the argument from EPR paradox (by 
Einstein, Podolsky, and Rosen in 1935 [75]) and other hidden-variable theories which try to equalize statistical correlation 
with entanglement. But later in 1964, John Bell famously showed Bell's theorem [76] that disproved the existence of local 


hidden variables of some types. 
?'[his is a reasonable assumption by the law of deferred measurement. 
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interfere constructively; when they cancel each other out, we say they interfere destructively. The 
example circuit below illustrates this phenomenon: 





xi: [— H HH 




















As usual, let's analyze the circuit step by step. After the first H gate, |x1) is set to 
a superposition state |x1) = a [0) + a |1)). After the second H gate, the state becomes 


|xı) = ANZ [0) + a |1)) + ATE |0) — Z [1)) = |0). Note that in this circuit, amplitudes 
of |1) cancel each other out (destructively interfere), while those of |0) accumulate (constructively 
interfere). 

Again we track the state of the qubits as the circuit runs (from left to right), using a tran- 
sition diagram. In the context of qubit states, the diagram is called the Feynman Path diagram, 


named after physicist Richard Feynman: 
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In this diagram, the state of the qubits also evolves from left to right. 


Feynman's Sum-Over-Path Approach 


Here we describe the precise prescription to track the amplitudes of a quantum state using 
Feynman's “sum-over-path” approach. The idea comes from the well-known theory of path 
integral |77, 78]. His key observation is that the final amplitudes of a quantum state can 
be written as a weighted sum over all possible paths the quantum system can take from the 


initial to the final state. In particular: 
* the final amplitude is given by adding the contributions from all paths; and 


* the contribution from a path is given by multiplying the coefficients along the 
path. 
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In the above example, there are a total of four paths (from left to right): 














1. 10) > |0) > 0): 4, x 4 = 3 
2. 0) > |0) > |): x 4 = 1 
3. 0) > 11) 10:235 x 45 =} 
4. 0) 1 > 1:3 x zz = —3 


The amplitude of |0) is obtained from adding path 1 and path 3, while that of |1) is obtained 
from adding path 2 and path 4. 


One neat trick to prevent interference is by introducing an entangled ancilla qubit, such 


as the following: 
Xi: |0 H I 
x2 : |0) OH 





























In particular, the qubits are initialized to all zero state. After the first H gate and CNOT 
gate, we arrive at [|x1x2) = 7 [00) + Z |11). In fact, this state is an example of a “Bell state” 
due to John Bell [76] (or an “EPR pair” named after Einstein, Podolsky, and Rosen [75]), a 
class of entangled states. With the Bell state, we now apply the second H gate. At the end of 
the circuit, we obtain |x1x2) = ANA |00) 4 75 [01)) 4 BF |10) 5 |11)). This process 





is again illustrated with the following Feynman path diagram: 
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Figure 2.4: A generic quantum circuit that implements a unitary transformation, mapping from a 
quantum state storing the input (and ancilla) to a quantum state storing the output (and garbage). 


Quantum Entanglement between Two Qubits 


The Bell state, |xq>) = a [00) + a |11), is of particular interest: if we pick any of the 
two qubits to measure, we would obtain outcome |0) or |1) with equal probability; and 
the other qubit is guaranteed to be measured in the same state as the first. This “correlation” 
between the two qubits are called quantum entanglement. However, this is not to be confused 
with two correlated random bits each with equal probability of being observed |0) or |1). 
Proof of the distinction is omitted for the sake of brevity; we refer the interested reader to 
the studies on "local hidden variables" theory, Bell's inequality [76, 79], and the CHSH 
game [80]. In essence, the measurement of one qubit intrinsically alters and determines the 
state of the other. Perhaps even more surprising is that these two qubits can be physically 
far apart as long as they were previous entangled as a Bell state. 'To form such a relationship 
between the two qubits, they must interact with each other, either directly through a two- 
qubit gate as shown above, or indirectly through a photon or a third qubit. We will explain 
in mathematical terms the meaning of entanglement in the next section. Also see [81] for 
an extensive review. 


A typical quantum circuit will look very much like a reversible circuit, except that the 
qubits are acted on by quantum gates (i.e., unitary transformations) as shown in Figure 2.4. 

To illustrate how quantum circuits are implemented in hardware, we encourage the reader 
to avoid thinking about lines in a circuit as wires that carry electric signals about the bits, but 
rather, as qubit registers (i.e., physical objects) that store data over time. In particular, in the 
quantum circuit notation, qubits are denoted as lines, and gates are denoted as boxes, which are 
applied to qubits in the order from left to right. Physically, a qubit is a physical object integrated 
on the quantum chip (such as an atom, or a superconducting gadget) and a gate is pulse signals 
addressed to the qubits (such as a laser beam, or a microwave pulse). This is to be contrasted 
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with a classical architecture, where a gate is a electric circuit component on the CPU and a bit 
is a voltage signal along wires sent through gates and switches. 


2.1.3 ARCHITECTURAL CONSTRAINTS OF A QUANTUM COMPUTER 


So far, we have discussed a number of exciting properties of qubits that together give rise to 
the power of a quantum computer, including superposition, entanglement, interference, etc. 
However, these properties are not easy to achieve—they incur strict requirements on the design 
of a quantum computer architecture, that is, the software and hardware structures of a quantum 
computer in practice. 


1. Probabilistic outcomes. Information in a quantum state is extracted through statistics from 
measurements. Measuring an n-qubit system at the end of a program will not yield the 
full information about the quantum state, instead it reads out one of the 2” possible bit- 
strings randomly. For this reason, a quantum program is intrinsically probabilistic. A good 
quantum program will make sure the desired bit-strings are observed with much higher 
probability compared to the undesired ones. Since it is not easy to entirely eliminate the 
probability of obtaining an undesired result, a program must be executed multiple times 
so as to gain statistical confidence on the observed outcome. Depending on the algorithm, 
a quantum program may need thousands of shots before meaningful statistics about the 
quantum state can be drawn. Algorithm designers typically use generic subroutines such 
as amplitude amplification, which improves the distribution by increasing the probability 
of the desired outcomes. The process for learning the full state information is called guan- 
tum tomography. Estimating the full quantum state of n qubits requires O(2?") shots of 
measurement. 


2. No copying of qubits. Another fundamental limitation of a quantum computer system is 
the inability to make identical copies of an unknown qubit. This is called the no-cloning 
theorem due to Wootters and Zurek [82]. Duplication of bits, as introduced in classical 
Boolean circuit model, is prohibited in a quantum circuit. In classical computing, we are 
used to making copies of data when designing and programming algorithms. But in a 
quantum system, we can no longer easily read from or write to the quantum memory, 
as read is done through measurements (which likely alter the data) and write generally 
requires complex state preparation routines. The no-cloning limitation also prevents us 
from directly implementing a quantum analog of the classical memory hierarchy, as caches 
require making copies of data. Hence, current quantum computer architecture proposals 
follow the general principles that transformations are applied directly to quantum memory, 
and data in memory are moved but not copied. Two similar scenarios not to be confused 
with the no-cloning theorem are: (i) we are allowed to make an entangled copy of a qubit. 
In fact, for any arbitrary unknown state |W) = a |0) + £ |1}, CNOT(|v) , |0)) results in 
the entangled state a |00) + £ |11). Measuring any of the two qubits would yield the same 
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statistical distribution; and (ii) we are allowed to rename (make alias of) a qubit when 
writing a quantum program. 


. Qubit-qubit interactions. Entanglement is arguably one of the essential concepts that pro- 


vide a quantum computer its computational power. Without entanglement, an n-qubit 
quantum system is no better than 2n random variables (as every qubit can be modeled in- 
dependently by two amplitudes). One could imagine any interesting quantum algorithm 
that explores exponential computational space by entangling its qubits in the system. In 
general, qubit-qubit interactions involve (i) applying directly complex multi-qubit gates, 
or (ii) if the coupling cannot be done, then either moving the qubits, or interacting through 
intermediate photons or qubits. This communication cost contributes to the very high im- 
plementation cost of quantum algorithms on NISQ machines. On one hand, complex 
multi-qubit gates are the dominant source of error as they are hard to achieve. On the 
other hand, today’s NISQ machines typically have poor connectivity (i.e., only few qubits 
are allowed to interact directly). The design of a NISQ architecture is therefore inevitably 
focused on at reducing the overhead of qubit-qubit interactions. 


4. Analog noises. A programmer seldom needs to concern failures in memory cells caused 


2.2 


by external noises or gate operation mistakes in a conventional processor. This is because 

classical computer systems today are robust enough against environmental noises. Take the 

modern Intel Xeon Phi processor as an example. A 2017 data [83] shows that the “soft- 

error-rate” (charge disturbance from radiation that causes to flip the data state of memory 
cell, register, batch, or flip-flop) is around 100 FIT ("failure-in-time"), i.e., 100 errors per 

billion device hours (114,077 years). The experiments were performed in 500 hours run- 

ning HPC (High Performance Computing) applications under a neutron beam (roughly 
57,000 years of equivalent natural exposure). On the other hand, in stark contrast, NISQ_ 
machines are highly sensitive to environment and control noise—a 2020 data on the IBM 

Q Melbourne device (with 14 qubits) [84] shows that the average single-qubit gate error 

rate is 4.78 x 107°, the average two-qubit gate error rate is 9.46 x 10 ?, and the readout 

error rate is 8.03 x 10 ?. Furthermore, a qubit decoheres (loss of quantum information) 

naturally—the average T1 (T2) decoherence time for the above machine is about 50 jus 

(66 us), due to spontaneous loss of energy (loss of phase). Each of these errors by itself 
may be have small effects on the quantum state, but if not mitigated or corrected, they 

can accumulate and become detrimental as we run long quantum programs. The effects of 
noise are so significant that a NISQ computer architecture must find strategies to reduce 

them. 


BASIC PRINCIPLES OF QUANTUM COMPUTATION 


Now we present a more rigorous picture of the central concepts in quantum computation. The 
theory of quantum computation can be formulated as a neat branch of mathematics. If we con- 
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sider classical computation as operations under the laws of boolean algebra, then quantum compu- 
tation operates under the rules of /inear algebra. The probabilistic nature of quantum states adds 
another layer of complexity to understanding the behavior of quantum computers. Nonetheless, 
all of it can be beautifully captured in four simple postulates, describing quantum states, compo- 
sition of quantum systems, measurements, and quantum gates, respectively. This mathematical 
formulation allows us to reason about how a quantum system behaves under our manipulation, 
i.e., quantum logic in a quantum computer. Throughout this section, basic linear algebra and 
probability theory concepts are reviewed or referenced when necessary. 

Together, the four postulates describe how information is stored and manipulated in a 
quantum system. In particular, a quantum computer works with a finite set of computational 
objects called quantum bits (or qubits). The quantum state postulate defines the superposition 
state of each qubit. 'The composition postulate then generalizes it to represent a system of multi- 
ple qubits, and provides a formal, mathematical definition of the entanglement property. The 
measurement postulate is used to describe how much information can be read out from a quan- 
tum system, as well as the consequence of the measurement action to the system. Finally, the 
quantum gate postulate defines the logical operations that transform a quantum system. 


2.2.1 QUANTUM STATES 


Definition 2.1 (Superposition). A single-qubit quantum state |) can be defined as a (col- 
umn) vector of two complex numbers: 


where a, B € C and |a|* + |B|? = 1. Here, a and £ are called the amplitudes of the quantum 
state. It is called a superposition state because we can rewrite it as a /inear combination of the 


basis states |0) = (5) and |1) = (7) as follows: 


i-e + pit) =a (3) +8 (1) = (5). 


Example 2.2 The two most common states are probably |0) and |1), often referred to as the 
computational basis states. Here, we highlight a few more quantum states that appear fairly fre- 
quently in quantum algorithms. For example, the “plus” and “minus” states: 

1 


a” +1), |=) 


1 
= 50 - |1)). 
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And two other states with complex amplitudes: 


li) (0) Fi]. |=i) 





1 I ; 
7 =a i |1)). 


Last, we write the conjugate transpose of |) as a row vector 


(l= (up = (a p). 
We can check the condition on the amplitudes by the inner product 


(V|v) = ae* + BB* = ja? + |B)? = 1. 


More generally, we can extend to a qudit system: a d-dimensional qudit system is defined 
as a superposition of d basis states: 


|Y) = æo |0) + a1 |1) + + ægi |d — 1). 


where |æo|? +--+ + læa-1|7 = 1. In theory, we can construct a qudit system using qubits. How- 
ever, in practice, many quantum systems are intrinsically a multi-level system. For example, a 
superconducting transmon has infinite levels among which the first few levels are easily acces- 
sible. A three-dimensional qudit system is sometimes called a guzri£ system. 


2.2.2 COMPOSITION OF QUANTUM SYSTEMS 


Now, we illustrate how to represent a system consisting of multiple qubits. In classical comput- 
ing, when moving from a single-bit system to a system consisting of n number of bit, we use a 
string of bits to represent the 2” possible states that the system could be in, for there are exactly 
2 choices for each bit. Take two bits, there are four possible states, namely 00, 01, 10, and 11. 
Naturally, intuition from the superposition principle tells us that, in a quantum computer, the 
Joint state of a two-qubit system should be a linear combination of the four possible basis states, 
i.e., |W) = æ |00) + B |01) + y |10) + |11). Indeed, we can build a bigger quantum state from 
small quantum states using a Zensor product. 


Definition 2.3 (Composition). The joint state of two separate quantum systems |o) = 
2; aj |aj) and |Vi) = 57, Bx lbk) is represented as the zensor product of the components. That 


is, 


J 


Iv) = Wo) 8 Ivi) = X > a Be (az) 8 lb;)), 
k 


where |a;) & |b;) can often be shortened as |a;b;). 
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Example 2.4 The four basis states of the two-qubit system are 


1 0 
0 1 
ioo = 19 $ 10) = | ¿[10 219 eIn = o 
0 0 
0 0 
0 0 
io = 11) el = | |.1n =I = |, 
0 1 


Example2.5 Let's take a look at the example of two generic qubits. Suppose the first qubit is 
|Wo) = oo |0) + o1 |1) and the second qubit is |V1) = Bo |0) + £1 |1), then their joint state is: 


do a D 
lv) = Io) & Iv) = (2) 8 (2) = 5 E » 
01 
(5. oii 
So we arrive at |W) = «of |00) + eof |01) + o1 Bo |10) + o1 1 |11). One can quickly verify 


that [wobo]? + |aoB1|? + Jon Bo? + Joi £1]? = 1 if and only if Jo? + Joi]? = 1 and |o? + 
|81|? = 1, that is if and only if |Vo) and |y1) are both valid quantum states. 


Example2.6 Itis important to note that zo all multi-qubit states can be written in the tensor 
product form. The class of multi-qubit quantum states that cannot be expressed in terms of a 
tensor product of two quantum states is called the entangled states. One famous example is the 
Bell state: 1 
Iv) = Um 
The key observation is that |Y} 4 («o |0) + o4 |1)) & (Bo |0) + £1 |1)) for any valid choices of 
ao, 01, Bo, and B1. Quantum states that can be written in a tensor product of two states are called 
the separable states or product states. 


(100) + |11)). 


Visualizing A Qubit: The Bloch Sphere 


In order to visualize how operations on qubits affect the quantum state, we first need a 
geometric representation of the qubit. Recall we need two complex numbers a, f in order 
to represent the entire state of a qubit. Each complex number can be specified entirely by 
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two real numbers, a, b. This means to represent the qubit we should need four dimensions. 
However, we are only able to visualize things in at most three dimensions. The Bloch Sphere 
coincides with the so-called principle axes of spin measurement, that is X, y, and z. The Bloch 
sphere is to visualize a qubit in three dimensions. First, we rewrite the quantum state |) 
into three real numbers (i.e., a,b, and q), and then reduce to two (i.e., 0 and q) after 
applying the normalization condition: 


lv) = ae’? |0) + be^? |1) = cos Ge |0) + sin (G) i- 


This equation now has only two unknowns ¢ and 0. This is enough to represent the 
qubit in three dimensions using spherical coordinates with a fixed radius r = 1. That is the 
quantum state |y} is a vector in R? given by (1, 0, —9). This can be visualized as a point on 
the surface of the Bloch sphere as follows: 








Equivalently, we can write a quantum state |y) in the Cartesian coordinate (x, y, z): 


1 
p = W) l= z + xox + yoy + zoz), 


where / is the identity matrix and o; are the Pauli matrices. The above formula is also known 
as the Bloch sphere representation of a quantum state. 

We have found a one-to-one mapping from a (pure) single-qubit state |y} to a point 
on the surface of the Bloch sphere. Here "pure" is to distinguish from another class of 
quantum states called “mixed states." A mixed state is a probability distribution of several 
pure states, thatis p = » '; pi |Wi) (Yıl. All mixed states live at the inside of the Bloch sphere. 
No quantum state lives outside of the Bloch sphere. 
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It is important to note that a single-qubit Hilbert space is a two-dimensional complex 
inner-product space, and the Bloch sphere visualizes a single-qubit state as a vector in R?. 
Two perpendicular vectors (i.e., having inner product equals zero) point in the opposite 
direction, such as the z = |0) and —z = |1) in the figure. 


2.2.3 | MEASUREMENTS 


How much information can we store in or get out of a single qubit? The amplitudes of a qubit 
state |J) = o |0) + £ |1) take complex coefficients. So there are infinite different states for just a 
single qubit. Can we possibly encode/decode infinite amount of information in one qubit then? 
Not so fast. The quantum measurement postulate in quantum mechanics states that the only way to 
read out information from a quantum system is by interacting with the system via measurement, 
from which we obtain a probabilistic outcome. Formally, we can define the following process. 


Definition 2.7 (Measurement). When we measure a qubit |W) = o |0) + B |1) we observe 
the basis state |0) with probability |a|? and the basis state |1) with probability |B|?. 


The process of measurement is irreversible and probabilistic, meaning once measurement 
has occurred, the state |Y} collapses into one of the two basis states (|0) or |1)) and the original 
quantum superposition cannot be recovered. 


Example2.8 If MeasZ is the measurement operator (along the computational axis), the mea- 
surement outcome for each qubit MeasZ |y} will be either |0) or |1) depending on its state. In 
the Bloch sphere picture, the MeasZ outcome is related to the /atitude of the quantum state— 
global phase (longitude) does not matter (see Table 2.2). 


Remark2.9 (Measurement along arbitrary axis) So far, all the examples are measurements 
along the z-axis. However, it is possible to measure along a different axis. To see that, let's 


first rephrase Definition 2.7. Observe (0|V) = (1 0) (5) = a. Similarly, we have (1|v) = f. 


Therefore, MeasZ gives value |0) with probability | (0|V/) |? and value |1) with probability 
| (1|w) P. In general, suppose we have a set of orthonormal basis B = (|b;)). Here, orthonor- 
mal means that Vi, j, we have (bi|bj) = 1 if i = j, and 0 otherwise. There exists a measure- 
ment operator M along that basis, where we obtain measurement outcome |b;) with probability 
| (bily) |? for each i (see Table 2.3). 

In practice, we can accomplish measurement along a different axis than the computational 
axis by applying a change-of-basis transformation U and then measure in the computational axis 
—)) basis), denoted 
as MeasX , can be accomplished with Hadamard transformations H and z-axis measurements 
MeasZ. 





just as before. For example, measuring along the x-axis (i.e., in the {|+) , 
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Table 2.2: Example measurement outcomes by MeasZ on initial state |y). 


|y) = a|00) + B01) + »|10) + 2|11) 














For completeness, we describe the general measurement rules for (pure) quantum states. To 
start with, we pick a measurement basis set, which can be written as a set of matrices {Mj}; 
satisfying the completeness condition 


Y MIM = 1. 
i 
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For instance, for the computational basis measurement, we take M, = |0) (0| and M» = |1) (1]. 


as? 


Upon measurement, we obtain the outcome “i” with probability 
Pr[observe i] = |M; |V) P = (v|M} Mily), 
which results in a quantum state 
Mi |v) Mi V) 
ly’) = = 


OM yi mw 


2.2.4 QUANTUM GATES 


But what are these transformations after all? We are going to formally introduce the guantum 
gate postulate, that is, how do we manipulate quantum states for computation. What kind of 
quantum logic operations can we achieve? How do we transform from one quantum state to 
another? Mathematically, this process is defined as a “norm-preserving linear transformation,” in 
other words, a unitary transformation. Transforming from |y) = o |0) + £ |1) to |g) = o |0) + 
B |1), we must have |a|* + |B|? = |a’|? + |B'|? = 1 to ensure that both |y) and |g) are valid 
quantum states. If we represent a quantum state |y) as a column vector as in Definition 2.1, 
then we can represent the quantum logic gate on the state vector by a linear operator U given 
by a matrix. 





Definition 2.10 (Transformation). A valid logical transformation must map a quantum 
state to another quantum state. That is, for U : |W) > U |y), we require | (V|V) |? = 1— 
| (V/|U*U |y) |2. Formally, this means that U is represented by a unitary matrix (i.e., UU = I). 


Unlike measurement operators which are irreversible and probabilistic, such logical trans- 
formation is reversible (since unitary matrix U is always invertible) and deterministic (since U 
maps any fixed initial state |y} to a unique final state U |y}). Physically, it means that the trans- 
formation is energetically coherent and we can always undo this process by inverting the action. 
From an information theory perspective, it means that no information is destroyed (or leaked to 
the environment) under unitary transformations. In other words, knowing the output and what 
transformation it underwent, we can always recover the input. Notice that this is not always 
the case in classical boolean logic. Take a common logic gate, the AND gate, as an example— 
knowing that we obtained the bit 0 from an AND operation of two bits x and y, i.e., AND 
(x, y) = 0, we cannot tell if we started with (x, y) = (0,0) or (0, 1) or (1,0). Hence, we call the 
AND gate an irreversible gate. An example of nontrivial classical reversible gate is the NOT 
gate, which negates the two states 0 and 1. Transformations via quantum logic gates, however, 
are all reversible. It is important to point out that the transformation principle does not account 
for the effect of noise. For instance, a qubit, when perturbed by the environment, can decohere 
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Table 2.4: Example quantum gates and a selection of their algebraic properties. 


Algebraic 


Quantum Gate | Circuit Form | Matrix Form Truth Table i 
Properties 


|0) ++ |0 
= 
|o) = |1 
ID e» Jo 
0) ++ |i) 
ID oH) 
0) + |0) 
|) e - 
0) ++ |0) 
ID) e i) 


(0) — |0) psg 
|D e ei |1) TXT = petia SOC 








Identity gate (1) 











Not gate (X) 








) 
) 
) 
) 











Y gate (Y) 











Z gate (Z) 

















Phase gate (S) 














T gate (T) 








lo) = +) m=], 
|n e P ind 














Hadamard gate (H) 




















to a classical state. Such a process is incoherent and not reversible. We will defer the discussion 
on the effect of noise to Chapter 8. For simplicity, this chapter will assume an ideal, noise-free 
situation. 


Example2.11 Quantum logic gates define the set of elementary operations that we can per- 
form in a quantum computer. Let's start with the simplest example, namely quantum gates on a 
single qubit. A single-qubit gate can be viewed as a transformation that takes one point on the 
Bloch sphere to another by rotating by an arbitrary angle along a certain axis. Table 2.4 shows a 
few examples of single-qubit operations. 


For example, when a qubit is in a superposition state |y} = o |0) + £ |1) then the opera- 
tion applies to each of the basis states, e.g., 
o 4- f o — p 


9*7 








H |y) = «GT |0)) + BCA |) = e|) + 817) = 
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X gate, Y gate, and Z gate are x (or 180°) rotations about the x-axis, y-axis, and z-axis 
of the Bloch sphere, respectively. S gate performs a 5 rotation about the z-axis, thus we have 
S? = Z. T gate performs a Ẹ rotation about the z-axis, thus T? = S. H (Hadamard) gate is a 
7 rotation about an axis diagonal in the x—z plane. 


Example 2.12 It is usually convenient to include generic single-qubit rotation gates (e.g., 
Rx, Ry, Rz gates) along the Pauli axes in our gate set. We write Ry (0) to indicate a rotation 
of 0 angle about the x-axis. Several of the gates we've already discussed are just examples of the 
Rz (0) gates, specifically the Z, S, and T gates which rotate by a x, 7, and Ẹ angle, respectively. 
Formally, the rotation gate can be written in their matrix forms as follows: 


6 0 cos?  —isin 
Rx(0) = cos =] — i sin -X = ae 5 
2 2 —isin? cos; 


0 0 cos? —sin ĉ 
RO= om 5 -isin SY = ( F ‘ 


R-(0 TE E f° 
z(8) = cos 5 isin 5 = o a 


Example2.13 — Two-qubit gates take two qubits as inputs. They typically have an “entangling” 
effect—the operation applied to one qubit is dependent on the state of the other qubit, in 
other words, they are conditional gates. Among the most common two-qubit operations are 
the controlled-not gate (or CNOT gate), and the controlled-phase gate (or CZ gate), as shown 
in Table 2.5. 

In the example, the CNOT gate is a two-input two-output gate which performs a NOT 
operation on the second (target) qubit only when the first (control) qubit is |1). Similarly for 
CZ gate, if the control qubit is |1), then we apply a Z gate to the target qubit. But looking at 
the truth table of the CZ gate, we notice that, in fact, it makes no distinction between the first 
and the second qubits—a phase is accumulated for the |11) basis. Hence, the CZ gate has a 
symmetric circuit symbol. One can in fact implement a CNOT gate with a CZ gate and vice 
versa. For example, CNOT is equivalent to a CZ gate with two Hadamard gates on both sides, 


since HZH = X: 
dc ms 


The fact that these gates are conditional gates can also be observed from their matrix 
representations. In general, we may construct a controlled version of any gate U. Notice that 
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Table 2.5: Example measurement outcomes by Meas Z on initial state |). 

Quantum Gate Circuit Form Matrix Form Truth Table 
00) — |00) 
01) ++ |01) 
[10) ++ |11) 
|11) ++ |10) 





|00) ++ |00) 
|01) ++ |01) 
|10) ++ |10) 
|11) +» -|11) 














controlled-U gate can be written as the sum of two terms, namely, when the first qubit is |0), 
nothing happens to the second qubit, and when the first qubit is |1), then we apply U gate on 
the second qubit: 


controlled-U = A(U) = |0) (0,8 7 + |) (11 @U = | : : | 


where the notation A(-) stands for a controlled version of a gate. One can quickly verify that these 
controlled gates usually have an entangling effect. In particular, they can transform a product 
state as input into an entangled state as output. For example, the following circuit produces the 


Bell state: 
ae l Jn +|11)) 
|o) v2 


Another gate, important in architectures which require qubits to be adjacent in order to 
perform multi-qubit operations, is the SWAP gate, which switches the states of two qubits, 
which is equivalent to interleaving three CNOT gates: 


=e 


WY 























It can be shown that single qubit gates and two qubit gates are universal for arbitrary 
quantum logic. In other words, any unitary gate on multiple qubits can be decomposed into a 
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sequence of one and two-qubit gates. One example of a universal gate set found commonly in 
the literature is 
G = (H, T, CNOT}. 


Physically, realizing a multi-qubit gate is extremely challenging. So finding an efficient 
decomposition of a unitary gate into a sequence of smaller unitary gates from a chosen gate 
set is critical to the success of executing a quantum circuit. This problem is often referred to as 
quantum compilation. We will revisit exactly this problem but in much greater detail in Chapter 6. 


Example2.14  'Ihree-qubit gates. These gates may be controlled on more than one qubit. One 
of the most famous examples is the Toffoli Gate (CCNOT or Controlled-Controlled-Not). It 


has the following circuit: 
141) 
143) 


The Toffoli gate can be used to achieve irreversible classical operations like AND and OR in 
quantum computing in a reversible manner. 


2.5 | NOISY QUANTUM SYSTEMS 


How do quantum systems interact with the environment? How do we characterize a quantum 
process in the presence of noise? In this section, we extend our discussion to include noisy quan- 
tum systems. 

To begin with, we emphasize that a quantum system is inherently probabilistic—when 
implemented in practice, quantum systems have to involve some incoherence processes, whether 
they are intentional (e.g., by measurements) or unintentional (e.g., by random perturbations), 
and they ultimately lead to probabilistic outcomes. For example, an ideal system can prepare 
a quantum state |y) with certainty using a perfect quantum circuit, while a noisy quantum 
system likely produces a random distribution of quantum states, i.e., |V;) with probability pi, 
due to imprecise controls or environmental noise. To model these effects, we need a more general 
definition of a probabilistic quantum state. 


2.3.1 QUANTUM PROBABILITY 


Recall that upon measurement of a quantum state |Y) = » ;; aj |xi), we obtain a classical proba- 
bility distribution over the measurement outcomes {|.x;) };, according to the probability (|o; |?]);. 
Indeed, the sum of probability equals 1, i.e., (V |y) = 7; |xi|? = 1. We can rewrite the proba- 


bility expression for the outcome x; as 


Pr[Meas(|V)) = |xi)] = lei? = | (Wixi) P = (Wixi) Oily). 
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Let us denote Mx, = |xi) (xi| as a projector, as the effect of Ix, is a projection onto the 
subspace spanned by |x;). Note that II is a projection if it can be written as II = » 7, |vx) (vkl 
where v,;’s are orthonormal (i.e., (vi|vj) = 6jj, where 5; j is the Kronecker delta). Equivalently, 
II is a positive semidefinite (PSD) matrix such that II? = II, i.e., projecting twice is identical 
to projecting once. 

We want to find the right notions of a noisy quantum state. We start by modeling it as a 
probability distribution over pure quantum states, (pi. |Wi)}i- 


Definition 2.15 The mixed state { pi, 





Wi)} of a quantum system is represented by the matrix 


p= >_ pid) (il. 


This is called the density matrix representation of a quantum state. Furthermore, if p rep- 
resent a mixed quantum state, it must satisfy tr(o) = 1, and p is positive semidefinite (PSD), 
where tr(-) is the trace of a matrix. Upon measuring this mixed state p, we obtain 


Pr[Meas(p) = |xi)] = ) ; p; Pr[Meas(|¥;) = |xi))] 
J 


=% p Wal) 
J 


= Y pit) (yyl Tl) 


j 
= tr(pII,;). 


Let us now extend the measurement rules in Section 2.2.3 to mixed states. Again, we start 
with a set of measurement operators (Mi); satisfying the completeness condition 5 '; Mj Mi = 
I. We obtain the measurement outcome “i” with probability 


Pr[observe i] = tr(oM,'M;) = tr(MipM,). 
where we used the property tr(AB) = tr(BA). The resulting mixed quantum state is 


r Mi pM; 
á tr(Mi pMj) 

To tie closely with the notions of classical probability, we can define the measure- 
ment process as observables, and model the outcomes using the expectations of the observ- 
ables. In particular, suppose we perform a measurement {M1,..., Mg} on a quantum state p. 
We report a value A; if measurement outcome i is received. This is denoted as an observable 
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O = 44M; + A2 M5 + -+ Àk My. As a result, we obtain a random variable x that takes value 
Ai with probability tr(pMj Mi). We call the expectation of the observable O with respect to state 
p as 
E[O] = E[x] = tr(pO). 
Now we continue to discuss the resulting mixed state when a pure quantum state is mea- 
sured partially. Suppose an n-qubit quantum state is shared among two parties, e.g., Alice holds 


on to the first 1/2 qubits and Bob holds on to the rest. It is then natural to ask: what is the state 
of Alice’s qubits, if Bob measured his qubits and obtained a probabilistic outcome? 


Definition 2.16 Given a bipartite quantum system in the form of a d? x d? matrix A & B 
(where d = 2"/2, A, and B are d x d matrices), the partial trace (over B) of the system is defined 
as 

trp(A @ B) = A-tr(B). 


For a generic pure state (possibly entangled between A and B) in the density matrix form 


; 
p= (Wl =| Y cig ig lia) elis) | | 3. ejns Lia) ® Lia) 
i4,1B JAJB 
= J iso], ss lia) Gal @ bis) (Jal. 
IA, B. JA.JB 


After Bob measures his qubits, Alice's state becomes the partial trace of the quantum state 
over Bob’s subsystem: 


pa = trg(|V) (WI) 
= J aig igo, jy lia) Vial (tr(lis) (iB) 


lA, B.JA.JB 


3 Y aig nof, m lia) (jal. 


m iA.jA 


To quantitatively study the impact of noise, we need distance measures between quantum 
states or quantum processes. We postpone the technical details to Section 8.1 where we discuss 
the noise mitigation strategies. 


2.3. | OPERATOR SUM REPRESENTATION 


Our goalin this section is to model the interaction between a quantum system with the environ- 
ment, and thus describe the impact of noise on a quantum state. Fortunately, we already have 
all the tools we need: namely unitary transformation and measurement. In particular, we can 
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Pin Pout 

















Pem = |e) (el A 














Figure 2.5: The unitary coupling picture between the systems and the environment. 


view the impact of environmental noise in this following “unitary coupling evolution” picture, 
as shown in Figure 2.5. 

A unitary transformation U is applied to both the environment and the system (peny Q 
Pin) (Figure 2.5), followed by an implicit partial measurement over the environment. Notice that 
we write environment first in the tensor product, for the sake of convenience in later discussions. 

More generally, any physical processes that can happen to a mixed quantum state can be 
written as a Zinear map: 

p > E(P). 

Such a linear map is sometimes referred to as a superoperator. For example, a unitary trans- 
formation (without interaction with the environment) can be written as an unitary operator: 
E(p) = UpUt. 

The goal is to write down an operator form for the entire unitary coupling evolution (which 
involves the pem, a unitary transformation U, and a measurement): 


Pin => Pout = teny (U (Pin e Pin)U"). 


Here U acts on both the system and the environment. Suppose we start with pew = |e) (e|, and 
arbitrary measurement operators My = |ex) (ex|, such that |ex)’s form an orthonormal basis for 
the space of the environment. Then we have 


E(p) = tren (U (oem [s pu") 
= 3 (exl U (e) (el & p) UT Je). 


k 


The key step is to define an operator Ex = (ej |U |e). Intuitively, we take the unitary U 
(acting on both the system and the environment) and cut it into separate operators Ej, each 
acting on just the system: 


U(le) ® |v)) = Y lex) 8 Ex |y). 
k 


Therefore, the overall linear map can be rewritten in terms of the operators: 


E= s EE. 
k 
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This is called the operator sum representation (OSR) of a quantum process [85], and the Ex’s 
are commonly named the Kraus operators. The condition for a set of valid Kraus operators is 
Lee i Ex = I. Notice that the operators often are non-unitary matrices. It is also important 
to note that the OSR is non-unique for a given linear map €(p), because one can convert to 
another set of operators, Fx, by changing basis in the measurements, Mx, for the environment. 


2.3.3 QUBIT DECOHERENCE AND GATE NOISE 


In this section, we use the operator sum representation to model some examples of quantum 
noise, namely the amplitude damping noise, the phase damping noise, and the depolarizing 
noise. 


Definition 2.17 An amplitude damping noise can be represented as 


E(p) = EopE, + E1pE], 


where Eg = ( ) and E; = ( d for some y parameter between 0 and 1. 


1 0 

0 JI-y 0 0 

This model captures the general behavior of a quantum system losing energy. For instance, the 
spontaneous emission of electromagnetic radiation for an atom can be modeled as amplitude 
damping, where y is the probability of emission. The effect of energy loss can be seen from the 
fact that E, brings the amplitude on |1) (excited state) to |0) (ground state). 

In a more realistic setting, the parameter y is a time-dependent function, which is often 
characterized by 1 — e'/T! , where t is time and T; is called the “spin-lattice relaxation time” or 
the “Tı coherence time." As time goes by, a quantum state is, therefore, exponentially more likely 
to undergo energy loss, and T; is a parameter characterizing the speed of such process. More 
specifically, the generalized version of amplitude damping describes the T; relaxation process, 
where the Kraus operators are: 


nc usn 2) 


; 0 
where the state converges to the mixed state po; = b j a 


Definition 2.18 A phase damping noise can be represented as 


Elp) = EopE§ + E1pE], 
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1 0 0 0 
where Eg — t a) and E, — ( 4) for some A parameter between 0 and 1. 


In a realistic setting, phase damping is thought to be related to the loss of quantum infor- 
mation without any loss of energy. Equivalently, it is a process where the qubit undergoes a phase 
flip (i.e., Z gate) with probability (1 — 4/A)/2. Similarly, the parameter À in phase damping is 
often characterized by a time-dependent function 1 — e'/7?, where T» is called the “spin-spin 
relaxation time" or the “72 coherence time." In general, since amplitude damping contributes to 
both 7; and T» rates, for a system with both amplitude and phase damping, we have T; > 27>. 
It is also worth noting that amplitude damping contributes to both T; and T» relaxation [86]. To 
accurately capture the behavior of qubit decoherence, T, and T2 are typically separately during 
idle or gate time, as qubits usually decohere faster during gate time. 

Another commonly studied model for capturing gate noise is the depolarizing noise (or 
sometimes referred to as a special case of stochastic Pauli noise). 


Definition 2.19 A depolarizing noise can be represented as 


Elp) = (1 — p)IpI 4 5 Xpx S YoY S ZpZ. 








so the corresponding Kraus operators are {VI = pI, y p/3X, V p/3Y, y p/3Z). 


This can be interpreted as a process where the state p is unchanged with probability 1 — p 
and applied with X, Y and Z with equal probability p/3. Due to the observation that 7/2 = 
(p + XpX + YpY + ZpZ)/4 for arbitrary p, we can rewrite 


4p 4p I 
=|{1 
E(p) ( 2) p+ ag 





which can be equivalently interpreted as the quantum state is unchanged with probability 1 — 22, 
and replaced by 4 with probability 2p. 

So far we have seen several simple noise models; to realistically characterize a noisy quan- 
tum system, we need more sophisticated models than the ones introduced here. Please see Chap- 
ter 8 for more details. 


2.4 | QUBIT TECHNOLOGIES 


This section is a computer scientist's guide to the basics of qubit technologies. But why does a 
computer scientist need the technical know-how in the first place? After all, in today's classical 
computing community, few programmers need any knowledge of how transistors work. Quan- 
tum computers require strict isolation and coherent manipulation of complex, physical systems 
to a level of precision never before attempted. 
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Although there already exists an ecosystem of layered quantum software tools and ab- 
stractions that serve as an interface between those layers, it is perhaps premature and fallacious 
to follow a model too similar to classical software. Some existing algorithms and systems tools 
are unrealistic in the short term because they were developed with idealistic assumptions about 
the underlying hardware. A decent appreciation of how qubits behave will come in handy when 
designing more efficient algorithms and software systems. For this reason, we dedicate the rest 
of the chapter to a gentle introduction to the leading technologies for realizing quantum com- 
puting devices. 

Today, experimentalists are building qubit systems in carefully controlled laboratory envi- 
ronments. '[he leading technologies that may have the potential for realizing scalable quantum 
computing include trapped ion qubits, superconducting qubits, semiconductor spin qubits, lin- 
ear optics, and Marjorana qubits, etc. The general philosophy of qubit design can be summarized 
in the DiVincenzo Criteria [87]: 


1. scalable system with well-characterized qubits; 
2. ability to initialize qubits (e.g., prepare in computational basis); 
3. stability of qubits (i.e., long decoherence times); 


4. support for a universal instruction set (e.g., single qubit gates and CNOT gate) for arbitrary 
computation; and 


5. ability to measure qubits (e.g., readout in computational basis). 


Note that these goals are in tension with each other. In particular, being able to initialize, 
perform gates, and measure requires interactions between the system and environment, but long 
decoherence times require isolating the system from the environment. This is the fundamentally 
difficult part about building a quantum computer. 

Tremendous progress has been made over the past few decades. A wide range of physical 
systems have shown to have the potential to implement qubits, and some have been demon- 
strated with proposals for scalable architectures. In the following, we describe two important 
technologies that have attracted the most interests in research labs, large companies, and star- 
tups. 


2.41 TRAPPED ION QUBITS 


One of the most natural ways of making a qubit is to use an atomic ion. An atomic ion makes 
a great qubit because its internal energy levels exhibit quantum mechanical properties. In the 
following, we introduce the basics of making trapped ion qubits, and how they can be integrated 
into a quantum computing system. 
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Figure 2.6: State transitions for two common types of trapped ion qubits: the optical qubit and 
the hyperfine qubit. 


Types of Atomic Ion Qubits 

In an atomic ion qubit, the quantum states are represented by two internal energy levels of the 
ion (that is, one for |0) and the other for |1)). An atomic ion (such as Cat, Srt, Bat, and Yb*) 
typically has more than two internal energy levels. So the choice of two of the levels determines 
how the qubit is controlled. The following two leading designs pick the energy levels differently, 
as shown in Figure 2.6. 


* Optical Qubits. On the left, we have an optical qubit because the two chosen energy 
levels have a separation of about 10? Hz, which is around the frequency of visible light. 
The two energy levels are from different orbitals of the ion, |0) from s orbital and |1) 
from d orbital. If the frequency of the laser beam matches the transition frequency 
from |0) to |1), the ion is excited after absorbing energy from the laser. The ion also 
makes spontaneous decay from the excited energy state to the lower energy state in 


around 1 second. Common ions that can be made into optical qubits include Ca*, 
Sr*, Bat, and Yb*. 


* Hyperfine Qubits. In contrast, a hyperfine qubit on the right chooses both energy levels 
from the s orbital, and thus has an energy separation of about 10!° Hz, which falls in 
the microwave spectrum [88]. A hyperfine qubit can be directly driven via microwave 
control or via Raman transitions which we will show in greater detail below. Ions com- 
monly made into hyperfine qubits include Ca*, Sr*, Ba*, and Yb*, Be*, Mg*, Hg*, 
Cd*, and Zn*. Throughout the rest of the section, we use !71Yb* hyperfine qubit as 
example [89]. 





Once the qubit states are defined, we need to know how to perform measurement and 
apply high-fidelity single- and two-qubit gates on them. 


Measuring a Qubit 
The measurement of a trapped ion hyperfine qubit is achieved via state-dependent fluorescence [88, 
90, 91]. As Figure 2.7 shows, an optical drive is carefully tuned to match a transition from the 
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Figure 2.7: Measurement outcome is observed by state-dependent flourescence. 


|1) state to an energy level in the p orbital, so that a spontaneous decay from the p orbital back 
to s orbital will emit photons, lighting up the ion. In fact, it is slightly detuned from the p level 
such that the decay happens instantaneously. However, if the qubit is originally in the |0) state, 
such transition will not happen. The 75/2 >? pj/2 is a cycling transition, which means that if 
we continue applying the Raman beams, the ions will remain fluorescent. 


Single-Qubit Gate: Raman or Microwave Transition 

In a trapped ion quantum computer, qubits are controlled by carefully tuned laser beams. When 
the frequency of the laser matches the separation between two states, population in the lower 
state will be excited to the higher state. For a hyperfine qubit, qubit states have separation for 
around 10!° Hz, so state transitions can be controlled directly via microwave pulses, as shown 
on the left of Figure 2.8. 'Ihe advantage of microwave-controlled single-qubit gate is its low 
error rates (10 9), while the disadvantage lies at its difficulty in focusing on individual ions due 
to its large wavelength. Alternatively, state transitions can happen via Raman transitions, that 
is, first exciting to a p state then decaying back to s states, as shown on the right of Figure 2.8. 
Again, the excitation is slightly detuned so that the spontaneous decay happens instantaneously. 
The Raman transition approach has slightly higher error rates (1074), but targets individual ions 
better. Arbitrary-angle single-qubit rotations R3 (0) can be implemented by tuning the Raman 
beat-note. The angle 0 and axis ¢ are determined by the duration and phase off-set of the Raman 
pulses. 


Two-Qubit Gate: Ising (XX) Gate 

The native two-qubit gate in a trapped ion system is called the XX gate or Ising gate [92-95]. 
Its entangling interaction is achieved via dipole-dipole coupling between two ions. A detuned 
Raman transition can apply spin-dependent forces on the ions, which triggers their motional 
excitations. The number of different modes multiple ions can move to is huge. Take an example 
where the ions physically oscillate in the direction perpendicular to the ion chain. On one hand, 
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Figure 2.8: Single qubit gates via Raman transition or microwave transition. 


if the two ions oscillate in phase (that is both moving synchronously up and down) then the 
distance between them is unchanged. On the other hand, the two ions oscillate out of phase 
(that is one going up and the other going down and vice versa) then the Coulomb force between 
them changes because their distance changes. We thus have a force that is dependent on the state 
of the ions. The motional excitations have an entangling effect because they lead to conditional 
phase shifts of the ions. Pulse shaping techniques are applied to disentangle the motions at 
the end of the gate. In particular, if the two ions are distance r apart and the oscillation is 
about 5, then the dipole-dipole coupling leads to a conditional phase shift of o = 4£4, where 
eó 
2r 
the spin of the ion: 


AE x A c This effective Ising interaction between the ions adds phase shifts depending on 


|00} H+ |00) 
|01) — e^? |01) 
|10) +> e^? |10) 
|11) e |11) 


This is called an XX interaction, because its operator has the form of a ox ® ox in the 
exponent: 


cos(¢) 0 0 —i sin(¢) 
— io MQM _ 0 cos(p)  —isin(g) 0 
SAIE , 0 —isin(q) . cos(q) 0 


—i sin(g) 0 0 cos(Q) 
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Figure 2.9: Schematics for a RF Paul trap. 


This XX interaction can be used to implement the well-known Maølmer-Sørensen gate. For com- 
pleteness, we show that the CNOT gate can be implemented using the following circuit: 


qo = Ry (a3) Ry( at) Rz( 2) 
- a XX(aZ) 


Rx(-5) 




































































Here the geometric phase of the XX gate is y = +4, and a = sgn(y). 


Loading (Trapping) Qubits 

In this section, we describe how ions are being trapped in place and prepared to their initial 
states. In particular, one of the reasons that ions are chosen as qubits is because they are charged 
particles, which can feel the forces exerted on them by electromagnetic fields. However, it is not 
possible to create a field that exerts inward forces in all directions, as the number of electromag- 
netic field lines going into an enclosed system must equal to the number of lines going out. 'Ihe 
best we can do is to create a stable equilibrium in one direction: 

The RF (radio-frequency) Paul trap |96, 97] cleverly gets around this issue by applying a 
sinusoidal electric field quadruple around the ions to keep them stationary, as shown in Fig- 
ure 2.9. 

On the left, four rods of electrodes are shown. Ions are trapped in between the four rods, 
lining up in a chain. The voltages on the rods are applied (as shown on the right) so that the 
chain of ions at the center can be held (on average) stationary. 
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Figure 2.10: Schematics for a trapped ion QPU. After initialized from the optical source on the 
left, the laser are split into independently modulated beams, and then focused on the HOA trap 
on the right, providing individual controls over the array of ions in the trap. 





Trapped Ion Quantum Processing Unit 

Integrating the aforementioned key components into a system [98, 99], a trapped-ion QPU 
(quantum processing unit), such as the HOA (High Optical Access) trap [100], shown in Fig- 
ure 2.10 is designed. 


2.4.2 SUPERCONDUCTING QUBITS 


In contrast to a trapped ion qubit, a superconducting qubit is implemented with macroscopic, 
lithographically printed circuit elements. The circuit elements are parameterized and configured 
such that they exhibit atom-like energy spectra, hence making an “artificial atom" with desired 
quantum mechanical properties. This technology has attracted significant industrial attention 
because it allows convenient design of qubits using existing integrated circuit (IC) technology. 
In the following, we introduce different designs of superconducting qubits, configured to operate 
in various regimes. 


Superconducting Quantum Circuits 

The most distinguishable element in a superconducting qubit is an electric circuit element called 
the Josephson junction [101, 102]. It is an insulator sandwiched between two superconductors. 
Below its critical temperature, a superconductor appears to have resistance dropped to zero, and 
pairs of electrons in the superconducting material form bonds, thus making them Cooper pairs. 
Ordinarily, single electrons have +4 spin; they are particles commonly referred to as fermions. 
But after forming as Cooper pairs, they have a total spin of 0 (making them particles with integer 
spins which are commonly referred to as bosons). With the superconductors in a Josephson 
junction, Cooper pairs can tunnel through the insulator in a quantized fashion (that is one pair 
at a time), thus giving rise to discrete energy levels needed for making a qubit. A superconducting 
qubit state is thus related to the number of Cooper pairs tunneled across the junction. 

The Josephson junction element is used to implement what is known as an anharmonic 
oscillator. In contrast with a harmonic oscillator, where the energy levels are equally spaced by Ao, 
an anharmonic oscillator has unequal energy spacing. This "nonlinearity" is convenient because 
now we can drive the transitions between only two of the energy levels (usually the two lowest 
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Figure 2.11: Types of superconducting qubits. Left: Circuit diagram for charge qubits (when 
E; < Ec) and transmon qubit (when E; > Ec), consisting of capacitor C and Josephson 
junction J. Center: Circuit diagram for a c-shunted flux qubit, where a junction is shunted 
with a number of junctions. Right: Circuit diagram for a phase qubit with current bias Jo. 


levels) represented as qubit states for computation without exciting the other levels of the system. 
Normally, the higher the anharmonicity (i.e., difference between hwo, and ho?) the better we 
can individually address the computational states. However, in practice, anharmonicity also sets 
a limit on the speed of gate pulses we can apply to the qubit. 


Types of Superconducting Qubits 
Figure 2.11 are the common types of superconducting qubits. 


e Charge qubits. A charge qubit defines its computational qubit state as the number of 
Cooper pairs on a superconducting island; this class includes the Cooper-pair box and 
the ¢ransmon qubit. Using the circuit shown in Figure 2.11, the superconducting is- 
land is located between one plate of the capacitor and the insulator of the Josephson 
junction. Operating in the “charge regime" (that is Ey < Ec), the qubit is controlled 
by a voltage source, which induces charge differences between the two sides of the 
Josephson junction. The qubit state |0) is given by the lack of Cooper pairs in the is- 
land, while |1) is given by the presence of a single Cooper pair. The charge qubit is also 
known as Cooper pair box [103-105]. The community has found that, in the charge 
regime, a qubit becomes highly sensitive to charge noise, making it hard to be kept 
coherent. Over time, more and more attention has been put on the flux regime (that 
is Ey >> Ec) that trade charge noise for flux noise which appears to be more man- 
ageable. One can operate in the flux regime with Ey >> Ec (typically E;/Ec > 50) 
by shunting the junction with a large capacitor, thus making C; >> C; and Ec small. 
This is commonly known as the transmon qubit [106]. 


e Flux qubits. In a flux qubit (as well as a fluxonium qubit [61, 107, 108]), the single 
Josephson junction is replaced by a SQUID (superconducting quantum interference 
device) [109]. A SQUID consists of a loop interupted by a number of Josephson junc- 
tions, where the effective critical current can be decreased by applying external mag- 
netic flux Pext through the loop. Thus, the effective Ey is tunable via changing the 
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SQUID’s critical current using external flux. For a flux qubit, the energy levels cor- 
respond to the integer number of superconducting flux quanta induced in the loop: 
$1 — Q2 + Yext = 20k, where qex; = TPext/ 69 and Po = h/(2e) is a magnetic flux 
quantum. The discussions of superconducting qubits in the rest of the section will be 
centered around flux qubits as well as some of its variants, such as fluxonium [110]. 


e Phase qubits. A phase qubit [111, 112] defines its computational states using the quan- 
tum charge oscillation amplitudes across the Josephson junction, controlled with cur- 
rent biases. In contrast to a flux qubit, a phase qubit operates in a regime where 
Ej/Ec ~ 106. 


Here we briefly describe how to perform measurement and elementary gate operations on 
superconducting qubits, following [113-115]. 


Measuring a Qubit 

Qubit readout is typically performed via a technique called Zispersive readout [116-119], which 
determines the qubit state via state-dependent frequency shift of a resonator coupled to each qubit. 
In the dispersive regime, where the detuning between the qubit and the resonator is large com- 
pared to their coupling rate, that is |w, — wg| « g, the qubit and the resonator push each other's 
frequencies with dispersive shifts. Since the shift on the resonator is state-dependent, we can 
use the changes in the frequency of the resonator to probe the state of the qubit without directly 
interacting with the qubit itself. 


Single-Qubit Gate: Charge and Flux Drive 
For transmon-like superconducting qubits, there are generally two classes of controls that drive 
individual qubits: (i) microwave control via a capacitively coupled resonator or feedline; it imple- 
ments single-qubit rotations (along x axis and y axis); and (ii) flux control via external magnetic 
field; it implements z-axis single-qubit rotations or can be used to tune the frequency of qubits, 
as shown in Figure 2.12. 

To enable microwave control, we couple the superconducting qubit to a microwave source 
(commonly referred to as charge drive or qubit drive) via a capacitor. The time-dependent voltage 
applied to the qubit can be written in a generic form 


Va(t) = Vov(t) = Vo(s(t)(sin(wgt) cos($) + cos(wat) sin(¢))), 


where Vo is the pulse amplitude, s(t) is a dimensionless (baseband) envelope function, wg is the 
driving frequency, ¢ is the phase offset determined arbitrarily, sin(wgt) cos($) = sin(wgt)I is 
the in-phase component of the pulse, and cos(wgt) sin($)) = cos(wgt)Q is the out-of-phase 
component of the pulse. Techniques like rotation wave approximation (RWA) can be used to 
show that, if the driving frequency equals the qubit frequency, the in-phase pulse corresponds 
to x-axis single-qubit rotations and the out-of-phase pulse corresponds to y-axis single-qubit 
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Figure 2.12: Left: Qubit frequencies as a function of external magnetic flux. The first three lev- 
els of the transmon, c; and o5, are plotted. Right: Circuit diagram for a frequency-tunable 
(asymmetric) transmon qubit (highlighted in black), consisting of a capacitor and two asym- 
metric Josephson junctions. Highlighted in gray are two control lines: the external magnetic 
flux control g and microwave voltage drive line V; (t) for each transmon qubit. 


rotations. More concretely, take an X gate as the example, we use a AWG (arbitrary waveform 
generator) to produce the pulse shape and send the signal in-phase through the qubit drive. 
The shape of the pulse is determined by the baseband s(t) and the amplitude Vo, which can be 
derived by solving for the total phase gained during time t. Similarly for Y control, we solve for 
the pulse shape and send the signal out-of-phase through the qubit drive. Details are omitted 
here; we refer the interested reader to the tutorial in [113]. 

The choice of phase offset ¢ is arbitrary. Suppose we set $ < $ + 7/2, then the in-phase 
and out-of-phase components are swapped (up to change of sign)—J becomes Q and Q be- 
comes —/. Recall from Section 2.2 that ZX = iY and ZY = —iX. So shifting a phase in the AWG 
is equivalent to applying a Z gate. This way of implementing a Z gate by shifting the phase of 
subsequent pulses is called the virtual Z strategy [120]. 


Two-Qubit Gate: Flux or Microwave Control 
Typical native two-qubit gates in a superconducting architecture include the iSWAP gate [60, 
121-124] (for flux-tunable transmons), CZ (controlled-phase) gate [61, 125] (also for flux- 
tunable transmons), and CR (cross-resonance) gate [126-129]. 

For two flux-tunable transmons (coupled via a capacitor), the strategy for interacting two 
qubits is by tuning the frequencies of the transmons such that energy exchange happens through 
capacitive coupling. More specifically, if we tune the frequency of the first qubit w4? to match the 
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Figure 2.13: Two-qubit interactions for two capacitively coupled transmons. Left: Two-qubit 
gates are implemented with resonance of qubit frequencies. Shown here are how qubit frequen- 
cies are tuned for i SWAP gate and CZ gate. Right: Circuit diagram of two capacitively coupled 
transmon qubits. 


frequency of the second qubit v 1 (often referred to as tunning the two qubit on resonance), we 
have enabled a periodic population swap between the basis |10) and |01), as shown in Figure 2.13. 
This is sometimes called XY interaction: 


1 0 0 

0 cos(gt)  —isin(gt) 
0 -isin(gt) cos(gt) 
0 0 0 


XY [t] E e i$ (oxox toyoy)t = 


— OCC 


where g is the coupling strength of the capacitor. Note that when we tune the qubit on resonance 
for time z/g, we obtain the iSWAP gate: 


] O0 0 0 
0 0 -i 0 
XY [2/22] = iSWAP = 
[x/2g] = iSW. TRE 
0 0 0 1 


The vi SWAP gate, which is equivalent to XY[7.], is sometimes useful as well. 

Alternatively, if we tune the frequency of the first qubit w8? to match the secondary fre- 
quency of the second qubit w14, we enable the periodic population swap between basis states |11) 
and |02). This is hardly desirable, if we leave the population at |2) which is beyond the computa- 
tion subspace (|0) and |1)); a phenomenon known as “leakage.” However, when the population 
is swapped to |02) and back |11), we gain a e^!" phase on |11). In effect, we have accomplished 
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the CZ gate. In fact, one can choose the interaction trajectory such that an arbitrary phase e~}? 
is gained on |11): 


100 0 
010 0 
CZe-|lop1 o 
0 0 0 e^? 


Both the iSWAP and the CZ gates are useful primitives, as they can be used to implement 
the CNOT gate: 
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One can also implement two-qubit gates using only microwave control. Instead of tuning 
the qubit frequencies via external magnetic fluxes, the CR (cross-resonance) gate [129] achieves 
the two-qubit interaction on two (fixed-frequency) transmons coupled with a resonator via the 
qubit drive. In particular, if we drive the first qubit at the frequency of the second qubit, then 
the Rabi oscillation of the second qubit will have a frequency dependent on the state of the first 
qubit. This is sometimes referred to as the ZX interaction: 


cos(0/2) | —i sin(0/2) 0 0 

u _ .-i$e;go, | -iSin(0/2) | cos(0/2) 0 0 
PIC unes 0 0 cos(0/2) i sin(6/2) 
0 0 isin(0/2)  cos(0/2) 


The CR gate can be used to implement a CNOT gate (up to a phase e/7/4): 


qo = —R,(2/2) = CR(—x/2) 
qı NEN | 


— Rx(x/2) 
































2.43 OTHER PROMISING IMPLEMENTATIONS 


Besides superconducting and trapped-ion architectures, there are other promising QC plat- 
forms. Due to limited space, we will not describe these platforms in detail, but refer the reader 
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to the relevant literature: semiconductor spin qubits [130-133], linear optics [134-136], and 
Majorana qubits [137-140]. 
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CHAPTER 3 


Quantum Application Design 


What kinds of computational problems are quantum computers good at solving? What are the 
major challenges for algorithm designers in the NISQ era? This chapter is devoted to updat- 
ing the readers with recent progress and upcoming challenges in near-term algorithm design. 
Quantum machines will be useless, if we do not know what algorithms we can run on them. 
So a big part in the race to practical quantum computing is to develop algorithms that run on 
NISQ computers. We lead with a section illustrating the general features of quantum informa- 
tion processing, how to exploit the so-called quantum parallelism, and how to evaluate the cost 
of a quantum program. After answering the above questions, we introduce a few medium-scale 
quantum algorithms such as the Deutsch-Josza algorithm and the Bernstein-Vazirani algo- 
rithm, as well as some other classes of algorithms tailored for NISQ computers, such as the 
Variational Quantum Eigensolver (VQE) and the Quantum Approximate Optimization Algo- 
rithm (QAOA). We conclude this chapter with a survey on promising quantum applications. 
In the next 5-10 years, we probably won't have quantum processors built into our laptop or 
smart phone. But they will become useful for synthesizing better drugs and material, producing 
lower-energy fertilizer, solving optimization problems more efficiently, and so on. 


31 GENERAL FEATURES 


We start the discussion on quantum algorithms with a summary of their general features. The 
power of quantum computing can be viewed as ultimately coming from the ability to encode 
exponential computational space into just a linear number of computational units—the state of 
an entangled n-qubit quantum system is represented by 2" complex coefficients. Each constant- 
time operation on the quantum system, in principle, manipulates non-trivially all 2” complex 
coefficients, through O (n) independent knobs (e.g., 7, X, Y, and Z controls on each qubit). At 
the end of a quantum algorithm, one expects to measure the n qubits and obtain a random out- 
come of n classical bits. The art of designing quantum algorithms is thus to find transformations 
on the state of the qubits after which the final measurements yield the desired outcome with high 
probability. In the following, we elaborate on this design process, followed by a few remarkable 
example quantum algorithms. 
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Figure 3.1: A typical quantum circuit implementation of a quantum algorithm. 


3.1.1 THE COMPUTING PROCESS 


The computational process of a quantum computer generally requires manipulating qubits in 
a particular manner, so as to fully exploit the power of quantum information processing. A 
prescription commonly observed in quantum algorithms is: 


* efficiently encode information into a small number of qubits; 
* cleverly build up entanglement and interference during the algorithm; and 
* design a final measurement that yields desired outcomes with high probability. 


As such, the implementation of a quantum algorithm can typically be expressed in terms 
of a quantum circuit in Figure 3.1. 

Even though a quantum computer is said to have the power of manipulating an exponen- 
tially sized computational space, we must not start with an exponentially complex initial state. 
Otherwise the state preparation circuit that puts the qubits into such state would be extremely 
expensive. 

In the following section, we describe in greater detail one of the key ingredients in quan- 
tum information processing: quantum parallelism. 'To better illustrate this phenomenon, we 
switch our mindset to the query model of computation. 


3.1.2 THE QUERY MODEL AND QUANTUM PARALLELISM 


A query model involves a black-box function (e.g., f : {0,1}” — (0, 1¥”) called an oracle (as 
shown in Figure 3.2). An algorithm is trying to learn some properties of f by evaluating f ona 
set of inputs but not examining how f is implemented internally. We can imagine the oracle Op 
is implemented as some private function (unknown to its user). The only way to learn the oracle 
is by passing some inputs to it and analyzing the outputs. The number of queries required by the 
algorithm is defined as the guery complexity of the algorithm. In a nutshell, a query algorithm is 
a prescription of how to (efficiently) learn properties of a function f of interest. 
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Figure 3.2: An oracle (or black-box function) that computes f(x). 


Here, we give some example problems well studied in the query model. 


* Searching Problem: Find x such that f(x) = 1 [13]. 


* Period-finding Problem: Find the period of f when inputs x are ordered from 0...0 to 
1...1, that is, find p such that f(x) = f(x + p) for all x [12]. 


* Collision Problem: Find x,y such that f(x) = f(y) (often used for analyzing 
graphs) [141, 142]. 


Quantum Query Algorithm vs. Classical Query Algorithm 

Intuitively, the advantage of a quantum query algorithm is that we can pass several inputs as a 
superposition quantum state to the oracle (as shown in Figure 3.2) at one time. In return, we 
get a new superposition state as output. In contrast, a classical oracle only accepts a single input 
at each time, which limits the information we can get from each query. 


Quantum Oracles 

Consider a function f : (0, 1)" — (0, 1)", meaning that f takes an n-bit input and returns an 
m-bit output. In Chapter 2, we showed that f must be made reversible when implemented in a 
quantum circuit. In order to make the oracle reversible, we add some output qubits to the circuit 
to make the output width and input width identical. In the following quantum circuits, we call 
[x) the input qubits (or input registers) and |y) the output qubits (or output registers, or ancilla). 


e XOR oracle. Oracle Oy is the XOR oracle that implements function f. It transforms 
a quantum state from |x) & |y) to |x) & |y & f(x)). Note that if |y) = |0), we obtain 
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e Phase oracle. OF is called the phase oracle of function f. It transforms a quantum 


state from |x) & |y) to |x) & (—1) C?» |y), where f(x)-y = Y; fG)iyi mod 2 is 
the inner product of the two bit-strings. 
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The XOR oracle and the phase oracle are equivalent, which means that each of them can 
be simulated by the other. We can build a phase oracle using the XOR oracle and vice versa. 

It is sometimes convenient to simplify the oracle for f : (0, 1" — (0, 1)" when m = 1, 
that is, f is a decision problem (1 for true and 0 for false). This is because the output is: 


1 
— |X 
A 


Considering f(x) is either 0 or 1, we have the output: 


x) 8 |- f(x) = )@ (0 8 f(x)) -11 e f@))). 


—z|x) e (0e f(x) - I1 e £69) = |x) 8 70/9 qo) - 11)) = 


Al- 


3.1. GENERAL FEATURES 59 
1 
a 
Hence, the oracle maps from |x, —) to (-1)/“|x, —). As we don’t care about the last qubit, we 
can simplify the system by ignoring the output qubits as follows: 


) 8 -D/9 (0) - 1) = |x) 8 CDP |-) = 17 |x, -). 
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Notice that a quantum oracle differs from a classical one by the ability to apply the function 
f to a superposition of states simultaneously, a phenomenon commonly referred to as “quantum 


ťD = yey la) 


parallelism,” 


3.1.3 COMPLEXITY, FIDELITY, AND BEYOND 


In quantum computing, there are generally two notions of costs: (i) device-independent compu- 
tational complexity and (ii) device-dependent implementation cost. The two notions typically 
rely on different assumptions, yet have close connection with each other, e.g., circuit complexity 
(i.e., number of gates) is related to running time (i.e., number of steps). 


Computational Complexity 
Following the definition of classical computational complexity, we can define quantum compu- 
tational complexity, in which we characterize general properties of quantum algorithms. As we 
shall see in Chapter 6, we can efficiently simulate a quantum circuit using one universal gate set 
with another universal gate set, which allows us to define complexity classes independent of the 
implementation details, such as the choice of gate set, and the accuracy of the quantum gates. 
The computational complexity of quantum algorithms is generally analyzed in two differ- 
ent styles, namely the time complexity and the query complexity. 


* Time complexity. The time complexity of a unitary transformation U is related to the 
number of gates of the smallest circuit that implements U. In most cases it is hard to 
find the time complexity for a quantum algorithm, as we need to prove some circuit is 
an implementation of U and also there are no other smaller circuits for U. 


* Query complexity. Query complexity is the number of times an algorithm needs to query 
a given black-box function (often called an oracle) to solve a problem. For many query- 
based quantum algorithm, such as Grover's algorithm, the query complexity is easier 
to analyze, as introduced the query model in Section 3.1.2. 
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Before we introduce the quantum computational complexity class BOP (bounded-error 
quantum polynomial time), we first review two classical complexity classes, namely P and BPP, 
using the circuit model of computation. 


Definition 3.1 The complexity class P is the class of all decision problems solvable by a 
polynomial-size uniform circuit family (with classical AND/OR/NOT gates) (C, : n € Nj de- 


terministically. 


Here, decision problems are functions that take n-bit input and produce 1-bit output (i.e., 
an Yes/No answer). Polynomial-size uniform means there exists a polynomial-time deterministic 
Turing machine that outputs a description of the polynomial-size circuit on all inputs. This is 
the class of problems that are considered as efficiently solvable. 


Definition3.2 The complexity class BPP (bounded-error probabilistic polynomial time) is the 
class of all decision problems solvable by a polynomial-size uniform random circuit family (with 


classical AND/OR/NOT gates and coin flips) (C, : n € N} with high probability. 


Bounded (two-sided) error means that the circuit Cj, solves a Boolean function such that 


* Vx € 0, 1)", if f(x) = 1, then Pr[C,[x] = 1] > 


* Vx € {0,1}", if f(x) = 0, then Pr[C,[x] = 1] < 


Notice that, in contrast to P, BPP allows coin flips in its circuit, and is the notion of efficiently 
solvable in randomized computation. 


Definition 3.3 The complexity class BQP (bounded-error quantum polynomial time) is the 
class of all decision problems solvable by a polynomial-size uniform quantum circuit family (with 
universal quantum gates) (C, : n € N} with high probability. 


From BPP to BOP, we replace the random circuit with a quantum circuit. Notice that the 
class BPP is contained in BQP (i.e., BPP C BQP), simply because we can simulate a fair coin 
flip by preparing a superposition state (0) + |1)) and then measure in the computational ba- 
sis {|0) , |1)}. The relations of BOP to many other classical computational classes are still exciting 
open research problems. We provide another proof of such relation, BQP € PSPACE (polyno- 
mial space) later in Chapter 9. Instead, we now turn our discussion to realistic implementation 
cost of quantum algorithms on a given hardware. 


Implementation Cost 

In practice, we care about the resource requirement to implement an algorithm on the physical 
device. There are several main concerns regarding implementations of quantum algorithms. The 
first issue we encounter is precision. Relaxing the idealized assumption that quantum gates are 
implemented with perfect accuracy, we must consider the impact of noise on the quantum circuit 
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implementations. The accuracy of a quantum circuit implementation is typically defined by some 
distance measure between an ideal vs. a noisy process. The commonly used distance measures can 
be categorized in two classes: (i) input dependent and (ii) input independent. When the input 
state is given, we may quantify the accuracy of an implementation by measuring the distance in 
the output quantum states: 


D (Uis b on UidealPin Ula) , 


where D(-) is a distance metric. Some common metrics between two quantum states are fidelity 
and trace distance, which are defined in Chapter 8. Alternatively, we can also directly measure 
the distance between two processes, regardless of the input state: 


D (Unais Uideat) ` 


It is also reasonable to compare measurement outcomes of two quantum circuits, instead 
of the quantum states themselves; this is typically the case in the context of classical simulation 
of quantum computation. Since measurement outcomes are essentially probability distributions, 
we adapt metrics such as /o/a/ variation distance for comparing two distributions, as defined in 
Chapter 9. 

When all of the above metrics are difficult calculate, for example, when a quantum com- 
puter has insufficient power (e.g., not enough qubits for the target quantum circuit) or when an 
efficient classical simulator is unavailable, a common last resort is to estimate the success rate of 
a quantum circuit using efficiently computable noise models. For instance, one can estimate the 
(worst-case) success rate of a quantum circuit under qubit decoherence and gate noise: 


Pruccess = IIgec (1 = €g) $ IIgeo(1 = Eq), 


where eg is the average gate error rate, and eq is captured by modeling 7; and T» during idle or 
gate time, as shown in Chapter 2.3.3. 

‘The second issue in implementing quantum algorithms is resource cost. In practice, we can 
further optimize a circuit implementation by strategically choosing the most robust qubits and 
most robust gates, or by shortening the running time of the circuit. As such, some resource 
metrics for describing the cost of an implementation are motivated. We list some of them as 
follows. 


e Qubit count. Also referred to as circuit width. It represents the number of qubits the 
algorithm needs, and limits the dimension under which the algorithm operates. 


* Gate count. The number of quantum gates used in the given algorithm. For NISQ_ 
algorithms, this is a useful metric for obtaining a general sense of their success rate, 
as gate errors are the most dominant source of noise in today’s NISQ machines. 
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* Circuit depth. The number of time steps the algorithm uses. Note that we need to find 
the deepest path in the circuit from the input to the output. This is often referred as 
"critical path" generalized from classical Boolean circuits. Circuit depth is closely re- 
lated to the "running time" of a quantum circuit. Higher circuit depth typically corre- 
lates with lower success rate, because deeper circuits suffer from noise due to the higher 
gate count as well as due to the higher likelihood of experiencing qubit decoherence. 


* Communication cost. Because of physical limits, we need to take extra effort to com- 
municate the qubits to enable the actual computation. This often happens when the 
physical qubits are not fully directly connected in a quantum computer. For example, 
if in a super conducting machine, two qubits are not close to each other and there is 
no physical connection between them, we need to swap one of them closer to another 
to apply later computation. 


e Spacetime volume. At a high level, it represents “space x time" of an algorithm. We 
can also define the quantum volume for a quantum computing machine as #qubits x 
depth. Quantum volume often serves as a quantity with which we can compare the 
performance of two different machines. Note that the error rate and the topology also 
contributes to the quantum volume. 


3.2 | GATE-BASED QUANTUM ALGORITHMS 


To many, the most intriguing discovery in quantum computing is the fact that quantum al- 
gorithms can solve certain problems that are inefficient or infeasible on a classical computer. 
In this section, we highlight a selection of interesting problems to provide the reader with a 
general sense of the key ingredients in quantum algorithms. For example, the Deutsch—Josza al- 
gorithm [143] is one of the first quantum algorithms with shown quantum advantage, that is a 
gap exists between the computational complexity of best-known classical algorithm and that of 
the quantum one. Another example we introduce in this section is the Bernstein—Vazirani algo- 
rithm [9]. Both algorithms show impressive speedup over the best-known classical algorithms, 
but there are several caveats. First, neither problem has known applications, as they solve very 
specific, rather contrived mathematical problems. Second, the comparisons were done using 
the query model which is not exactly how classical functions are usually evaluated, and not di- 
rectly related to the familiar running time concept. Third, our analyses are based on comparing 
quantum algorithms with classical deterministic algorithms. Many examples here can be made 
more efficient with randomized algorithms. But it is generally believed that for any problem p, 
its quantum query complexity is smaller than (or equal to) its classical randomized complexity, 
which is smaller than (or equal to) its classical deterministic complexity. Nonetheless, it remains 
an open problem to determine the relation between the three models of computation. 
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Figure 3.3: Feynman path diagram of Deutsch’s algorithm. 


3.2.1 DEUTSCH-JOSZA ALGORITHM 
Let us begin with a simple quantum algorithm called “Deutsch’s algorithm,” and then move on 
to its generalized version. 


Deutsch’s Algorithm [6] 
Problem statement: Given two unknown bits bo, bı and an oracle that implements function f 
s.t. f(0) = bo, f(1) = bi, we want to determine the “parity” of bo and by, i.e., whether bo 
and bı are different or the same. 


Classical solution: We can use £wo queries: f(0) and f(1), and then compare the results to 
determine the parity of bo, b1. 


Quantum solution: We use the following quantum circuit to solve the problem (in the simplified 
phase oracle model): 
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Let us analyze the circuit with the Feynman Path Diagram depicted in Figure 3.3. 
The quantum states for each step in the circuit are listed below: 
0 
+) = J5(0) + |1)) 
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(-1)7)0) if f(1) = f(0) 
 (C0/9n) iffa) f(0) 


The constant factor (—1)/ (9 is called the global phase of a qubit, which does not matter 
when measured. We can see that the result of the final measurement will always be 0 if 
bo = b, and will always be 1 otherwise. Thus, the problem is solved within one query of 
the oracle, as opposed to two queries in the classical case. 


Deutsch-Josza Algorithm [143] 

‘The constant speedup as shown in Deutsch’s algorithm may not seem very impressive. However, 
the following generalization is certainly eye-catching, for it has a striking exponential speedup 
over any classical deterministic solution. 


Problem statement: We have an oracle implementing a function f : (0, 1)" — (0, 1} which is 
promised to be either constant (f(x) is always 0 or f(x) is always 1) or balanced (half of 
the inputs x, f(x) — 0, and the other half f(x) — 1). We want to determine whether it 
is constant or balanced. 


Classical solution: In the classical world, we need to query 27^! + 1 times (in the worst scenario), 
covering inputs for more than half of the domain (0, 1j" in order to determine whether f 
is balanced or constant. Should we use a randomized algorithm, we can achieve a bounded 
€ error on the result in log (1) queries. 


Quantum solution [143]: We use a similar circuit to the one for Deutsch's algorithm: 










































































io IH H HAA 

(0 4H HHA 
of 

H H HA 

0) -H H A 



































Again, we can track the quantum state with a Feynman path diagram (see Figure 3.4). 
We write down the quantum states after each step of the algorithm as follows: 
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Figure 3.4: Feynman path diagram for Deutsch—Josza algorithm. 
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We only focus on the amplitude of the quantum state |0...0). We can see that if f is 
constant, the amplitude is 1 (up to global phase), and if f is balanced, the amplitude is 


O(as Y; (-1)4®) cancels out). This means that after we measure the output, if the 
x€(0,1)" 
result is 0...0, f must be constant, and if the result is something other than 0...0, f must 


be balanced. The algorithm solves the problem with only one query, which means we have 
an exponential speedup over classical algorithm (277! + 1). 


3.22 | BERNSTEIN-VAZIRANI ALGORITHM 


In (the non-recursive) Bernstein-Vazirani Algorithm [9], we will see that there is no exponential 
speedup compared to the classical algorithm but a polynomial one. 
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Problem statement: Given an oracle access to f : (0, 1)" — {0, 1} and a promise that the func- 
tion f(x) 2 s-x = Y, 8;-x; mod 2, where s is a secret string that the algorithm is 
trying to learn. 


Classical solution: Classically, we will have to try it out the brute-force way by giving n inputs, 


1:65 
f(100...0) = sı 
f(010...0) = s2 
£(000...1) = $n 


Quantum solution: In quantum computation, we can do this in just oze query as shown in the 
circuit below: 
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Notice that the circuit is identical to the one used in Deutsch-Josza algorithm; the only 
difference is in the oracle and the interpretation of the measurement results. 


The states at different levels are: 


1. 109” 


2.Hy ^s E dx) 
V2 eio 1n 


3. r 3 lx) = 4 3 |x) 


= 300) + (CD? |1) & $(0) + (702 11) 8 --- @ $(0) + (-1)™ [1)). 
‘The state of the ith qubit thus depends on s;: if s; = 0 (or 1) then qubit i is |+) (or 
I-». 

4. O if s; 2 O and 1 if s; » 1. 








So far, we assume that such an oracle as used in the Bernstein-Vazirani algorithm exists 
without worrying about its practical implementations. Details about implementing oracles can 


be found in Chapter 6, Section 6.1.2. 
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3.3 NISQ QUANTUM ALGORITHMS 


A class of quantum algorithms that are believed to be useful in the NISQ era, where technology 
limits the quantity and quality of qubits. Resources in a NISQ machine are scarce, not enough 
for applying existing QEC techniques. In the following, we show how NISQ algorithms boost 
their resilience to noises. In particular, we discuss two leading NISQ algorithms: 


1. Variational Quantum Eigensolver (VQE) [46, 144, 145] and 
2. Quantum Approximate Optimization Algorithm (QAOA) [68] 


3.3.1 VARIATIONAL QUANTUM EIGENSOLVER (VQE) 


The key to understand the VOE algorithm is the variational principle, which says that for any 
given vector |), 

(Y| H |y) = Eo. 
H is hermitian, which implies that H has a complete set of eigenvectors which are orthonormal 
to each other. Let | Eo) . | E1) . .. | E) be the eigenvectors of H with eigenvalues Eo, E1, .. En, 
respectively, with Eo being the smallest. 'Ihen any given vector can be written as a superposition 
of these eigenvectors, i.e., 


lw) = Co |Eo) + C1 |E1) + -Cn | Es). 
So for any given |W), we have 


(YIH |y) = (Co (Eo| + C1 (E1| + -Cn (En) H(Co | Eo) + C1 |E1) + ...Ca |En)) 
= CEo BO E AS En 
> Eo. 


As shown in Figure 3.5, to find the lowest eigenvalue, we keep checking E for different 
values of |y} till we get the minimum. The variational principle means that any guess would 
get us closer to the true lowest value. In VOE, we parametrize y using 6 and find the value 
for 6 which gives the smallest value for H which is similar to classical optimization. This can 
be summarized as guess and check, where the check is measuring the (H) and the guess is 
using a classical optimizer to select the next value of 0 based on previous results. Here, |y (0)} 
is the ansatz and we have two ways of getting the ansatz: (i) hardware/machine ansatz and 
(ii) problem/physics ansatz. 

One good analogy is learning Ping Pong. The above ansatz can be understood as different 
approaches that one can take to learn it. One strategy would be to go to a coach and learn 
from the first principles which is similar to “Problem/physical ansatz.” The second approach is 
building up from what you already know, for example tennis. Both has its pros and cons and the 
idea is to select the ansatz based on the problem. If our choice of ansatz doesn’t matter much, 
then we can go for the one which has the best machine performance. But if the choice of the 
ansatz matters then we have to design the circuit accordingly. 
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Figure 3.5: Illustration of a variational quantum algorithm that alternates between a quantum 
circuit and a classical optimizer. In this process, the quantum circuit (parameterized by 6 ) eval- 
uates some cost function E [6], and the classical optimizer gradient descends for the next set of 
parameters. 


3.3.2 QUANTUM APPROXIMATE OPTIMIZATION ALGORITHM 
(QAOA) 
Both of the algorithms solve the same problem which is finding the lowest Eigenvalue of a 
hermitian matrix H. In the context of quantum simulations, such matrix H is called the Hamil- 
tonian of a quantum system. The Hamiltonian is an operator which tells us about the energy 
of the system at time t. Since H is a hermitian matrix, it has real eigenvalues and orthonormal 
eigen vectors: 
H |En) = En |Es). 


Out of all these eigenvalues, the one with lowest energy is called ground state energy. The chal- 
lenge is to find this ground state of the 2” x 2” matrix Hamiltonian. 

MaxCut/Clustering is one example where we can use QAOA [68] to find the solution. 
In this problem we are required to maximize the weight of the edges crossing the cuts. A cut 
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Figure 3.6: A MaxCut of a graph, where edges are cut if they connect a gray node with a white 
node. 


is made on an edge whenever it is connected to vertices of two different colors. This can be 
rewritten as as an eigenvalue problem with a Hamiltonian given by 


1 
Hes L Qq-z2z). 
edges i,j 


Here, Z; and Z; are the Pauli Z matrix for the ith and jth vertex. The maximum cut then would 
be same as the ground state eigenvalue of this Hamiltonian. For example, we can consider just 
2 edges connected by an edge. Then Hamiltonian is given by 


1000 00 0 0 
TEC AIL ‘Jel, °] [0-10 0 
2{}o 0 10]| l0 -1/^ l0 -1 0 0 -1 0 
0001 0 0 0 0 


This matrix has eigenvectors |00) (no crossing) with eigenvalue 0 and |01)(with crossing) 
with eigenvalue —1. Here 0 corresponds to edges connecting the same color and —1 represents 
the one which connects different colors. Figure 3.6 is an example MaxCut of a graph of five 
nodes. 


3.4 SUMMARY AND OUTLOOK 


Quantum computing opened an entirely new way of solving computational problems efficiently. 
Itis so far the only model that violates the extended Church-Turing thesis—remarkably a quan- 
tum computer can solve some computational tasks in exponentially fewer steps than the best 
classical computer. Large gate-based algorithms such as Shor's algorithm [11, 12] and Grover's 
algorithm [13] have shown practical potential of quantum computers. These algorithms have also 
generated public concerns in computer and network security—current public key cryptosystems 
rely on the hardness of factoring large numbers and computing discrete logarithms, which can 
be solved exponentially faster on an idealized quantum computer. 
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In the near-term, quantum devices are still limited in fidelity and size. These NISQ de- 
vices [14] are not fault tolerant and thus cannot implement the algorithms developed for ideal 
quantum computers. 

Due to the resource limitations of NISQ machines, benchmarking is typically performed 
on small gate-based algorithms (such as the Deutsch—Josza algorithm and the Bernstein- 
Vazirani algorithm), variational quantum eigensolvers (applied to various electronic systems), 
quantum approximate optimization algorithms (applied to small problem instances), and small 
random circuits. One of the biggest challenges of the NISQ era is to develop algorithms that 
run on NISQ computers. 

Remarkably, the competitions between quantum and classical algorithms have formed 
a virtuous cycle. A recent example is the invention of the HHL algorithm for solving linear 
systems [146] thought to have exponential speedup over any classical algorithms. Further, a 
quantum-inspired classical algorithm [147] was discovered with significant improvement over 
existing classical algorithms that matches its quantum counterpart in some cases. Despite the 
loss of quantum advantage in those cases, the consideration of a quantum solution has advanced 
our understanding of the nature of this problem resulting in a classical improvement. 


Further Reading 

The focus of this book is on near-term applications. A promising approach to reduce the cost 
of quantum algorithms is to use an approximate or heuristic approach to solve problems, giving 
rise to the hybrid classical-quantum algorithms, such as variational algorithms for simulating 
molecules and materials [46, 144, 145, 148], optimization algorithms [68], and machine learn- 
ing [149-151]. 

A near-term milestone for quantum computing is a demonstration of quantum advantage, 
sometimes referred to as quantum supremacy, where experimentalists want to show a quan- 
tum computer can solve some problem exponentially faster than the best classical computer. To 
demonstrate this, one would need not only to build a quantum machine powerful enough to 
run experiments, but also to choose a test problem that is simple enough for a quantum com- 
puter but hard enough for a classical one. Aaronson and Arkhipov proposed in 2010 the Boson 
Sampling algorithm [152]. A small experiment with six photons has been implemented [153], 
but demonstration at a larger scale remains challenging [154]. Google proposed to use random 
circuit sampling to demonstrate quantum supremacy [155], whose classical hardness was later 
made rigorous by Bouland et al. [156]. 
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CHAPTER 4 


Optimizing Quantum 
Systems—An Overview 


‘The second part of this book is dedicated to techniques for optimizing quantum computing at a 
systems level. In this chapter, we describe the layers of a quantum computer system. In subse- 
quent chapters, we give examples of key optimizations that work across these layers, strategically 
trading software complexity for compute efficiency. 

Developments in the theory of quantum algorithms and the implementation of quan- 
tum hardware in the past few years have been truly remarkable. But there are still formidable 
challenges lying ahead. So far, an enormous gap exists between the resources required by many 
discovered algorithms, and the resources available in today’s devices. We will have to learn to 
execute large quantum algorithms under highly-constrained conditions. It is of paramount im- 
portance that we optimize for the resource consumption and success rate of a quantum program 
via sharing of information throughout the software-hardware stack. Such information includes 
characteristics of the target application and the underlying hardware, for example. An overar- 
ching theme in quantum computer systems research in the NISQ era will be software-hardware 
co-design, or in other words, vertical integration across the systems layers. A family of techniques 
across many layers will be needed. Each and every optimization will play a vital role in enabling 
practical quantum computing. Indeed, this is the emphasis of this book. 


41 STRUCTURE OF QUANTUM COMPUTER SYSTEMS 


‘The aim of this section is to provide a bird’s-eye view of the key components in quantum com- 
puter systems. After introducing the architecture layers of a quantum computer, we shed light on 
where research opportunities lie in the NISQ era for computer system researchers and engineers. 
Quantum computing is at a similar stage of development as classical computing in the 1950s. 
Today’s QC systems consist roughly of three essential components, namely the three layers in 
quantum computer architecture: application layer, systems software layer, and hardware layer, as 
shown in Figure 4.1. Today's classical computer systems manage highly complex hardware and 
software through layering abstractions. Going up through the systems stack, each layer hides 
some implementation details and expose a manageable set of controls for the next layer. 

In contrast, the development of quantum computer systems is still at its nascent stage. 
This is great for researchers, because there are so many interesting problems to be solved. It 
also means that resources are very scarce and that we are motivated to break abstractions and 
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Figure 4.1: Selective sharing of information allows algorithms to use limited resource in NISQ_ 
hardware most efficiently. 


pay for efficiency with greater software complexity. Even classical computing is backsliding a bit 
toward less abstraction as the end of Dennard scaling puts pressure on architectures to become 
more efficient. How much of what we learn in the next fivfive years will carry forward to a 
future of much larger quantum machines? Perhaps more than we might think, as it would be 
hard to imagine a future in which qubits and quantum operations are not costly. A functional 
quantum computer requires painstaking attention to the isolation and control over many qubits. 
Some physical details may always be exposed. The experience and lessons we learn about how 
to manipulate qubits in NISQ computers, be it at the algorithmic, systems, or hardware level, 
will pave the way for larger fault-tolerant quantum devices in the future. Noise resilience is not 
only for experimentalists who build the hardware to worry about; opportunities are ubiquitous 
in the entire systems stack. In fact, it is crucial for algorithm designers, systems architects, and 
software developers to take responsibilities in tackling this challenge together. It is expected that, 
in the NISQ era, a QC toolchain must break the traditional abstraction layers and use aggressive 
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optimizations throughout the full systems stack. The key to successful execution of quantum 
algorithms on NISQ devices is to selectively share information across layers of the stack (from 
device specifics to application characteristics) such that programs can use the limited qubits most 
efficiently. 


4.2 QUANTUM-CLASSICAL CO-PROCESSING 


An important variation of quantum computing systems is their use as specialized hardware ac- 
celerators within a classical computation. Indeed, this hybrid co-processing approach will likely 
be the dominant structure of quantum systems for the foreseeable future. 

While quantum computers are (currently) small and unreliable, a great way to exploit 
their special, but limited, abilities is to adopt a hybrid model [145] which leverages both quan- 
tum and classical computation. Almost all useful algorithms require some amount of classical 
pre-processing or post-processing. For example, Shor's algorithm has a series of classical arith- 
metic operations before and after the quantum order-finding subroutine. But perhaps the most 
promising example is in quantum chemistry, where Variational Quantum Eigensolver (VQE) 
algorithms perform a kind of heuristic search by iterating between a quantum machine and a 
classical supercomputer. 'The goal is to find the lowest energy state of a chemical compound (the 
ground state). As shown in Figure 3.5 of Chapter 3, we start from the best-known configuration 
of electrons from a classical computer and estimate the energy of that configuration using the 
quantum machine. This estimate is then given back to a classical computer to guide its search to- 
ward a configuration with lower energy. In this way, the quantum machine acts as an accelerator 
for the energy modeling part of the computation. By solving for lowest energy under different 
configurations and constraints, we can explore a range of molecular reactions. 

This hybrid example has some great advantages. First, it sidesteps the "innovator's dilemma" 
by leveraging an initial guess derived from our incumbent classical technology, rather than di- 
rectly competing with that technology. Second, hybrid algorithms break a long program into 
multiple iterations of short programs, which allows us to effectively utilize the limited number 
of instructions a quantum machine can reliably execute. Third, it allows us to pick small but clas- 
sically challenging problems (chemical compounds) that can be represented in a small number 
of quantum bits. In order to determine which orbitals the electrons are in, Nature only uses n 
electrons to “model” n electrons, whereas classical computers require combinatorially k” bits to 
model n electrons, but quantum computers only need kn qubits to model n electrons. Fourth, we 
have a clear measure of success, as we know that classically-computed ground state energy can be 
significantly higher than experimentally-observed values, even for small compounds. If our hy- 
brid approach can get closer to experimental values, then our quantum machine has helped com- 
pute something not computable classically! Long-term, improved understanding of molecular 
reactions could lead to better materials, more efficient photovoltaics, and lower-energy fertilizer 
production. 
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Even as quantum machines scale, quantum algorithms are likely to be specialized, mak- 
ing the quantum device a very domain-specific accelerator. Most practical applications will still 
require a combination of general classical and specialized quantum processing to be useful. 

The hybrid approach implies a number of interesting research challenges for system de- 
signers. Traditional quantum algorithms can be statically compiled with a high level of optimiza- 
tion using known input parameters. With hybrid algorithms, some of a quantum program’s input 
parameters can change each iteration. For example, a compiler may spend hours optimizing for 
quantum instructions that include quantum rotations for specific input angles to solve a chem- 
istry problem, but now we find that the angles change every iteration. This suggests that we need 
a partial compilation strategy in which programs are optimized for unchanging parameters, but 
then quickly re-optimized each iteration for parameters that change. 

Hybrid algorithms also require more thought to be given to hardware and software com- 
munication mechanisms between quantum and classical hardware, as well as how such ensem- 
bles might be presented as compute resources to users. IBM was the first to make a physical 
quantum machine accessible via the cloud, which has greatly grown the quantum computing 
community and allowed research into how to adapt to the physical properties of real machines. 
The IBM machines, however, are cumbersome for hybrid computation, as the batch queue inter- 
face is really designed for stand-alone quantum programs and the latency to couple with classical 
computation is long. 


4.3 QUANTUM COMPILING 


A quantum compiler aims to efficiently express a high-level quantum program using instruc- 
tions that a quantum machine recognizes and natively supports, balancing practical architectural 
constraints. 

A quantum algorithm is implemented in a quantum domain-specific language (QDSL). 
The quantum compiler translates the high-level program into quantum assembly code (QASM) 
that can be executed on a target hardware. This is accomplished through a series of transforma- 
tions and optimizations on a guantum intermediate representation (QIR) of a program. Finally, 
at the lowest level, machine-level instructions that orchestrate the hardware control pulses are 
scheduled and optimized. 

For a program to be realizable on a given hardware, a number of architectural constraints 
must be satisfied. This typically means considering the following practical aspects. 


e Instruction set. There are certain limited number of quantum instructions that are sup- 
ported in a given architecture. A compiler should aim to translate high-level quan- 
tum programs using the supported instruction set. In most cases, this instruction set is 
“Clifford+T” gates, comprised of the CNOT (controlled- NOT) gate, X (NOT) gate, 
H (Hadamard) gate, and T (x /8-phase) gate. This is a common set for most gate-based 
NISQ machines, as well as large-scale FT machines (e.g., with surface code error cor- 
rection). Some NISQ compilers choose to target directly the physical analog pulses 
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for improved hardware control. The detailed discussions on pulse control have been 


deferred to Chapter 7. 


e Qubit communication. A quantum algorithm is hardly interesting if it can be imple- 
mented with only single-qubit gates, as two-qubit gates (or multi-qubit gates) pro- 
vides the entangling power between qubits. Two-qubit gates are implemented by qubit- 
qubit interaction/communication. Qubit communication has different meanings in the 
NISQ vs. the FT contexts—usually for a NISO machine, not all qubits can directly 
interact with each other, two qubits interact by moving closer to one another via a chain 
of swap gates until they are directly connected hence allowed to interact. The time to 
complete a swap chain is proportional to the length of the chain. In FT machines, 
qubit interactions are accomplished through fault-tolerant operations depending on 
the error correcting codes (such as braiding and lattice surgery in surface-code error- 
corrected devices!). With today's technology, building large-scale quantum machines 
with all-to-all qubit connectivity is shown to be extremely challenging. The latest ef- 
fort from IonQ [56] offers a machine with 11 fully connected qubits using trapped-ion 
technology. Superconducting machines, for instance by IBM [51] and Rigetti [158], 
typically have much lower connectivity. Any scalable proposal would involve an archi- 
tecture of limited qubit connectivity and a model for resolving long-distance interac- 
tions, hence inducing communication costs. This constraint is sometimes referred to as 
"device topology." 


e Hardware noise. Another important consideration for compiling quantum programs 
is to minimize errors caused by hardware noise. Errors under consideration typically 
include memory errors (caused by decoherence of qubits) and gate errors (caused by 
imprecise control of gates). In general, the longer the program runs, the higher the 
chance that the qubits experience decoherence. The more gates are applied, the lower 
the chance that the program succeeds at the end. In today's technology, a two-qubit 
gate proves be challenging, hence it is one of the dominant sources of error. A com- 
piler normally aims to express a quantum program in fewer qubits, or fewer number of 
gates, or shorter circuit depth, etc. Note that these targets are non-exclusive, sometimes 
conflicting, in which case the compiler would need to balance between the constraints. 
More advanced noise-aware compilers have also been proposed. For example, in NISQ. 
machines, some qubits are more robust then the others, so picking the longer-lived 
qubits to perform important computation can improve the overall success rate. 


e Available parallel control. Depending on the technology that implements the qubits, a 
compiler can be constrained by the available parallelism. The parallelism limitation is 


TBraiding and lattice surgery are techniques that implement gates between logical qubits on the surface code lattice. 
Details are omitted as they are out ofthe scope of the book; we refer the interested reader to Chapter 8 and other tutorials [22, 
27, 157] for basics of quantum error correction and topological quantum codes. 
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Figure 4.2: A detailed quantum compilation flow outlining the transformations and optimiza- 
tions involved in a generic compiler. 


usually the consequence of hardware control mechanism, or error mitigation protocols. 
For instance, the width of the tunable laser beams in a trapped-ion NISQ machine 
limits the number of independently controlable qubits, and thus the number of parallel 
single-qubit gates. Some error mitigation protocols dictate that no parallel gates are 
allowed when they are physically located close to each other, reducing crosstalk errors 
between them. 


Figure 4.2 illustrates a typical quantum compilation toolflow. At its core, the quantum 
compiler passes a high-level quantum program through a series of optimizations, generating 
the most efficient and robust low-level executable (i.e., sequence of classical and quantum in- 
structions) for the target hardware, balancing different architectural constraints. For historic 
reasons, more recent work typically targets NISQ-era architectures, but older work targets large 
FT architectures. Nonetheless, most techniques we introduce here generally apply to the dif- 
ferent architectures. As hinted in Chapter 2, compilation for quantum machines is very similar 
to classical circuit synthesis. In the classical setting, we take some high-level language (C-like, 
Verilog, etc.), and compile it all the way down to instructions for transistors. This similarity is 
not pure coincidence; after all, the quantum circuit model of computation is generalized from 
the Boolean circuit model. 


4.4 NISQVS. FT MACHINES 


Quantum compiling in the context of NISQ and FT era can be drastically different. This section 
aims to name a few examples of such differences, so that one does not confuse a technique for 
NISQ machines with another for FT machines, and vice versa. 
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Notably, quantum compiling in the NISQ era tends to be more dynamic. The emerging 
NISQ applications, such as the variational eigensolver and the quantum approximate optimiza- 
tion algorithm, have hybrid/interleaved classical and quantum processing—quantum circuits are 
parameterized with the parameters optimized by a classical algorithm. So the traditional model 
of compiling static quantum programs once would not work well in the NISQ context. [159] 
aims to save some compiling cost by reusing the partial synthesis results across the iterations of 
the algorithm. 

Another difference is in the topology of the architecture and the model for resolving two- 
qubit interactions. As a result, communication costs will differ. In the context of a NISO ma- 
chine, the most frequently used approach to resolve a long-distance two-qubit gate is to move 
one qubit closer to the other through a chain of swaps. A SWAP gate can easily be implemented by 
three CNOT gates. In an FT machine, such as with the surface code error corrected architecture, 
we can resolve long-distance interactions between logical qubits through a process called braid- 
ing (i.e., movement and transformation of qubits) [160, 161]. Braiding has very different cost 
models than those of swapping. For instance, braids can extend to arbitrary length and shape in 
constant time, given that they never cross other braids; latency (i.e., time cost) of a swap chain 
is proportional to the length of the chain. 

A third difference highlighted here is the choice of instruction set. Quantum circuit syn- 
thesis has been largely done in the context of Clifford+T gate set, due to its nice algebraic struc- 
tures. Although that is a reasonable choice for FT machines (as Clifford gates are straight- 
forward to implement fault-tolerantly for stabilizer error correction codes), it is not the ideal 
choice for NISQ machines. For example, NISQ machines can typically perform single-qubit 
rotations along one of the principal axes (e.g., z-rotations) to very high precision, while suffer 
on two-qubit gates such as CNOT gates. It remains an open problem in discovering optimal 
device- or application-adapted synthesis algorithms. 

Last but not least, quantum compiling in the presence of noise has been under-studied. 
Integrating noise-awareness in circuit synthesis, gate scheduling, qubit mapping, pulse synthesis, 
and compiler validation are among the first challenges in quantum computer systems. 
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CHAPTER 5 


Quantum Programming 
Languages 


The concepts of data and operations for quantum computers could be drastically different from 
those of classical computers. For instance, the superposition principle dictates that quantum 
data (or qubit states) are intrinsically probabilistic as the information stored in a qubit can only 
be partially read out through an irreversible process called measurement which yields a prob- 
abilistic outcome. Operations on quantum data stored in one part of the memory could affect 
the data in another remote part due to a property called entanglement. What is perhaps more 
surprising is that quantum data cannot be duplicated into two independent copies, known as 
the no-cloning theorem. On top of those, one more layer of complexity is added by the fragility 
of quantum states. QC systems are susceptible to decoherence (i.e., spontaneous loss of quan- 
tum information in qubits) and operational errors. These are just some examples of the unique 
properties presented in quantum programs. They influence how a quantum program needs to be 
executed. Quantum algorithms typically involve a hybrid of classical and quantum processing. 
As such, a common strategy in quantum programming language design is to adapt and augment 
conventional programming language semantics and type systems to express the new properties 
of quantum programs. 


51 LOW-LEVEL MACHINE LANGUAGES 


Ata lower level, a quantum hardware is controlled by instructions signaled by a classical host pro- 
cessor. The quantum assembly language (QASM) is a direct translation from a quantum circuit 
to a sequential description of quantum instructions for executing a quantum program. Although 
some existing low-level quantum languages are developed primarily with device-independent 
and software portability in mind, more and more attention is paid to exposing device specifics, 
such as the hardware native gates, device connectivity, and noise models, to the language itself 
and to its software toolchain. 

One of earliest low-level quantum languages is called QASM [162]. In the QASM lan- 
guage, a quantum program is described as a linear sequence of gate instructions. For example, 
the EPR pair creation circuit is written as shown in Figure 5.1. 

Due to its root from quantum circuits, sequential QASM language suffers from its limi- 
tation on modeling complex classical control, such as “repeat-until-success” procedure and other 
non-trivial branching. To remedy this, and to improve its expressiveness and coverage, a number 
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Figure 5.1: The QASM code and circuit diagram for creating an EPR pair with measurements. 


of extensions to QASM have been developed. In these extensions, basic constructs (commonly 
used in classical programming) such as loops, subroutine calls, barriers, and classical feedback 
control are added. For example, the OpenQASM [163] backend has been developed by IBM 
Q, and ARTIQ [164] by the trapped ion community. 

Many have argued for a more expressive language to support full control over the physi- 
cal properties of the machine, such as pulse features. For example, OpenPulse [165] developed 
by IBM is one of the efforts in that direction. OpenPulse is a set of tools for building experi- 
ments out of pulses. The performance of the experiments replies heavily on the programmer’s 
understanding of the physical system. 

These low-level languages are naturally more closely tied to hardware specifics. Hence, 
optimizations for compiling to such languages must be tailored to the specific characteristics of 
their supported hardware, including device topology and noise rates etc. An efficient low-level 
software tool can allow quantum algorithms to be successfully executed on resource-constrained 
machines, such as NISQ computers. 


5.2 HIGH-LEVEL PROGRAMMING LANGUAGES 


A high-level programming language is needed to represent complex classical and quantum in- 
formation processing in quantum algorithms. One notable example that we will return to later 
is the representing and compiling for hybrid classical-quantum computations. 

Designing a language that enables programmers to exploit these quantum properties on 
real hardware while maintaining usability remains challenging. Balancing between abstraction 
and detail is key. On one hand, exposing device specifics helps programmers write more efficient 
code, but on the other hand it dramatically increases the complexity of the language. In this 
section, we provide a short review of the rapid development of QC languages and software tools 
in recent years from both academia and industry. 

Recall from the beginning of this chapter, we argue that the unique properties, such as the 
probabilistic, entangling, no-cloning, and error-prone nature of quantum states, influence the 
execution model of quantum programs, and hence the design of a semantically safe programming 
language. Due to its complexity, a QC system requires fast and reliable classical control. We 
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encourage the reader to revisit Chapter 1 for details about the classical-quantum co-processor 
model. Due to this hybrid nature of classical and quantum information processing, most ex- 
isting quantum programming languages are themselves Domain-Specific Languages (DSLs). 
The common approach is to either directly embed in or follow similar design of a classical base 
language. The benefits are apparent here—instead of writing an entirely new language and its 
software ecosystem, we can reuse parts of a widely used language and inherit some of its opti- 
mizations such as resolving control flow. Furthermore, programmers do not need to learn a new 
language from scratch. Nonetheless, many hope that a new language that follows more rigor- 
ously the theory of quantum information processing will aid the discovery of new algorithms. 
Some have proposed new data typing for qubits and a strongly typed language to ensure proper 
manipulations of qubits. 

Just as with classical programming languages, quantum programming languages can be 
classified in to two categories: functional and imperative. A functional language encourages more 
mathematical, abstract, compact implementation of algorithms. Examples of functional quan- 
tum programming languages include Quipper [166], Quafl [167], LIQuI|) (“Liquid”) [168], 
and O£ [169]. An imperative language describes the steps of algorithms sequentially in greater 
detail. It allows direct modifications of variables and often is more resource efficient. Examples 
of imperative quantum programming languages include Scaffold [170] embedded in C/C++, 
and ProjectQ [171] and Quil [172] embedded in Python. 

Current NISQ systems are rapidly evolving and are highly resource constrained. Any lan- 
guage (together with its compilation software) will need to be versatile enough to keep up with 
the fast rate of change in QC systems. For example, NISQ applications such as Variational 
Quantum Eigen-solver (VOE) require multiple rounds of interleaved classical-quantum pro- 
cessing, which presents new challenges in language design and compilation optimizations. 

Compared to hardware development and theoretical algorithmic understanding, our ex- 
perience with the OC programming language design and its supporting software toolchain is 
rather nascent. With recent pushes of cloud-based access to QC hardware in the industry (such 
as [84, 158, 173, 174] and more to come), more and more realize the need for a full-stack QC 
software and hardware. We expect the growing developer community on quantum computing 
will pay more attention to the design of quantum programming languages and their software 
toolchain. 


5.3 PROGRAM DEBUGGING AND VERIFICATION 


How do we know a quantum program implements the transformation as intended? Can we 
prevent programmers from writing code that violates some quantum properties? Verification of 
quantum programs is a unique and non-trivial task, but we take a moment to discuss some ex- 
citing recent developments in program testing inspired by classical techniques, such as program 
debugging, formal logic, and proof assistant. 
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There are two possible notions of verification in quantum computation: hardware verifi- 
cation and software verification. The former refers to the problem of verifying that hardware is 
capable of performing quantum logic operations as intended by a program. ‘The latter refers to 
the problem where we want to verify that a quantum program is bug-free and implements the 
desired transformation. 


* Hardware Verification. We need tools to understand and characterize the machines that 
we build. At a basic level, the behavior of quantum devices can be characterized through 
a process termed quantum tomography (see reviews in [175 ]), where multiple measure- 
ments are used to estimate quantum states. As machines become larger, however, a 
systems-level approach is needed. One possible approach would be to compute and 
uncompute a circuit and use tomography to determine whether a machine returns to 
its initial state. More sophisticated tests attempt to measure the "quantumness" of a 
machine and its ability to create entanglement across its qubits. Validating quantum- 
ness in a machine using a (possibly purely classical) prover is a challenging task and 
is tightly related to computational complexity theory. For the lack of space, this book 
will omit some of the seminal work in verifying quantum hardware. We refer the in- 
terested readers to the Summary section at the end of the chapter, as well as reviews 


in [176-178]. 


* Software Verification. The purpose of software verification is typically two-fold: ver- 
ifying that (i) high-level programs are bug-free, and that (ii) compiler transformations 
preserve logical equivalence. If we have a simulator or working machine, we can per- 
form end-to-end unit tests or we can invest some extra quantum bits to test asser- 
tions. Methods [175] have been developed to test for basic properties such as whether 
two quantum states are equal, whether two states are entangled, or whether operations 
commute. We can either adopt testing-based or formal-methods approaches for this 
problem (or hybrids of both). The rest of the section is devoted to detailed discussions 
on verifying quantum software. 


Three useful verification approaches that are being widely used today include the applica- 
tion of classical simulation, quantum property testing, and formal logic. Although these techniques 
do not prevent/detect all types of errors nor do they scale well to large systems, they are found to 
be useful in partially verifying aspects of the computing process so as to gain some confidence of 
its success rate. Both software and hardware verifications are expected to be particularly impor- 
tant for NISO computers, shielding against the adverse impacts of errors from buggy programs, 
unreliable compilers, or noisy hardware. 


5.3.1 TRACING VIA CLASSICAL SIMULATION 


The most widely used verification approach is arguably tracing the evolution of quantum states 
using classical simulation. Both software and hardware verification problems can be tackled us- 
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ing classical simulation, because it allows us to exactly compute and compare the inputs and 
outputs of an ideal transformation with those of an actual one. Simulation is informative, in 
that it allows us to do code tracing and reveals the states of a quantum program step by step. 
Classical simulation, however, is not easy, as it requires to compute transformations in an ex- 
ponentially large state space. Interestingly, there exists a fundamental tension between classical 
simulation of quantum computations and quantum supremacy. If we can efficiently simulate 
quantum computation on a classical computer, then we have proven that this quantum compu- 
tation does not demonstrate quantum supremacy! Verification approaches involving too much 
of an algorithm's state space also have similar implications. If we are optimistic and assume 
that some quantum algorithms have supremacy over classical algorithms, then we must come 
up with restricted verification properties that only require partial simulation or formal verifica- 
tion of a sub-exponential state space. For readers who are not familiar with classical simulation 
techniques, please see Chapter 9 for details. 

Classical simulation is a useful lens toward the subtle boundary of quantum 
supremacy [146, 155, 179]. Classical computers can simulate quantum algorithms consisting of 
only "Clifford gates" in time polynomial in the number of qubits used in the algorithm, which 
proves that these algorithms do not demonstrate quantum supremacy. Algorithms such as Shor's 
factoring algorithm, which are exponentially better than known classical algorithms, contain T 
gates as well as Clifford gates. We do not know how to classically simulate Shor's algorithm in 
sub-exponential time. We do know, however, how to simulate algorithms consisting of Clifford 
gates and T gates in time polynomial in the number of qubits and exponential in the number 
of T gates [180]. So practically, we can afford circuits with not too many T gates in classical 
simulation. That is not completely uncorrelated with the fact that there are resource theories 
defined around the number of T gates to help us understand the boundary between classical and 
quantum computing power. Verification through simulation might exploit the classical side of 
this boundary by trying to define correctness properties that only require simulation of parts of 
an algorithm that contain a small number of T gates. What these properties will be, however, 
is very much an open area of research. 

But even classical simulation does not capture all sources of errors. In order to completely 
imitate the quantum process, we must include the impacts of hardware noise. The difficulty 
for simulation with noise is two-fold—first our understanding of the physical noise today is 
still limited. Modeling realistic noise remains an active field of research; second, even if we 
have a perfect noise model, noise simulation itself is extremely challenging. No known efficient 
methods exist that accurately simulates the effects of noise but scales sub-exponentially in the 
number of qubits. 


5.3.2 ASSERTION VIA QUANTUM PROPERTY TESTING 


Here we describe how to perform quantum assertions (for instance, testing whether two quan- 
tum state are equal) in quantum circuits. 
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Property testing, also known as hypothesis testing,’ is an area that aims to design algo- 
rithms to check some (global) properties are present or absent in some large object through 
restricted (local) queries to the object. More concretely, following [175], we have the next defi- 
nition: 

Definition 5.1 — A property P for a set of objects X is a subset of XX, that is, P C X. Let 
d : X x X — [0,1] be a distance measure on X. 


e An object x € X is e-far from P if d(x, y) < e forall y € P. 


* An object x € X is €-close to P if there exists y € P such that d(x, y) > e. 


Definition 5.2 An algorithm is an e-property tester of P if it accepts x € X with probability 
of at least 2/3 if x € P or rejects x € A with probability of at least 2/3 if x is e-far from P. 


Quantum property testing extends the definitions to use quantum algorithms to test quan- 
tum objects. 


Testing Properties of Quantum States 
Here we briefly describe some of the widely used strategies for discriminating quantum states. 
For a d-dimensional pure state y € C4, a convenient distance measure is the trace distance: 


D«(|V) .16)) = 5 Iv) (wl —1¢) (ell = v1—1(/19) P. 





where || - ||1 is the 1-norm. 

Ideally, we want to find an algorithm that tests for a property (that is a e-tester) using a 
small number of copies only in terms of e, regardless of d. When this is not possible, we attempt 
to minimize the dependency on d. 

In the following we give some example properties of quantum states and illustrate their 
property testers. 


* Testing if a state |) is equal to another known state |p). Note that “equal” here means 
that the two states are the same up to global phase, that is satisfying |W) = e? |o) 
for some real number 0. The simplest test for this property is to measure |y) in the 
basis {|) (6| . 7 — |$) (6|) and accept if the first outcome is observed. Notice that the 
acceptance probability is exactly | (y |ġ) |? which is 1 if the two states are equal; the 
rejection probability is 1 — | (W|@) |? which is €? if the trace distance D«(|v) .|)) > €. 
So we can repeat the test and verify equality with O(1/e?) copies. We remark that, in 
fact, any non-trivial properties on pure states require Q(1/€*) copies to achieve the desired 
2/3 success probability. 


'This is not to be confused with statistical hypothesis testing which is a method of statistical inference based on sampling 
data sets. However, the two concepts are not without any connections. 
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Figure 5.2: Testing productness of |y} using a series of swap tests on |W) & |W). 


* Testing if two unknown (possibly mixed) states, p and o, are equal. To test this property, 
we introduce an important procedure called the swap test, due to Buhrman et al. [181]. 
Take two quantum states p and o and an ancilla qubit |0). 
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In this circuit, a controlled-swap gate is sandwiched by two Hadamard gates. When 
we measure the third register (ancilla qubit), we accept if the outcome is |0), and reject 


if the outcome is |1). In essence, this is a “similarity test,” because one can derive [181, 
182]: 


— (Pure state case) Pr[accept|V) & |9)] = 4 + 11 (wld) |? = 1— 1 Dally, |9))?. 
— (Mixed state case) Pr[acceptp ® o] = i + $tr(po). 


We can analyze similarly for the probability of rejection (see details in [175]) and ob- 
tain the tester using O(1/e?) copies. The swap test is also optimal for testing such a 


property. One can generalize to equality test for multiple states with techniques such 
as permutation tests [183]. 


Testing if a pure state |W) is an entangled state. In particular, we want to test if |y) can be 
written as a tensor product of n local states (i.e., speedup |Y) = |vV1) 8 |W2) 8- & 
|Wn) or not (in which case, |y} is entangled). We can use a series of swap tests on two 
copies of |y) and repeat for O(1/«?) times to test for productness [184, 185]. More 


specifically, the swap test is applied to each pair of the n local parts in |W) & |W), as 
shown in Figure 5.2. 


87 


88 5. QUANTUM PROGRAMMING LANGUAGES 


Many other testers exist for properties of quantum states. For example, we can test if a state 
belongs to stabilizer states [157, 186], matrix product states [187], pure vs. mixed states [188], 
low Schmidt-rank states [189], or having arbitrary finite properties [190]. 


Testing Properties of Quantum Dynamics 

One can also test properties of some quantum transformations, where we are given a black-box 
transformation U and need to decide whether U has some property or is far from having it. Tests 
vary depending on assumptions such as whether we are given access to the controlled operator 
(c — U) and/or the inverse operator (UT!) in addition to U itself. 

For unitary operators, two common distance measures are used to evaluate the perfor- 
mance of the tests: one for worst-case analysis and the other for average-case analysis. Here 
we take the definitions over pure input states as examples (see Chapter 8 for generalizations to 
mixed states). 


e For two d-dimensional unitary operators U, V, we define the worst-case distance over 
all possible pure states as: 





Dus (U, V) = max DU |) — V |) = max y1 — (IU VIV) P. 


e For two d-dimensional unitary operators U, V, we define the average-case distance as: 
1 


548 4! - BS B'll = y1 -| (U, Vp, 


where ||M || = REDIT |Mi|? is the 2-norm, and (U,V) = Str(UTV) is the 


Hilbert-Schmidt inner product. 


Dayg(U, V) = 


One useful tool that maps properties of quantum states to properties of unitaries is called 
the Choi—Jamiotkowski isomorphism [191, 192]. This tool sometimes allows us to take a test for 
properties of quantum states and apply it directly to test for properties of unitary operators. The 
idea is to first prepare the maximally entangled state of two d-dimensional systems 


pes cs. 
M) = ee 


and then apply the unitary operator U to the first system: 


d 
1 
IU) = — » Uli) (il. 
vd i,j=l 
Now applying tests on the two states |U) and |V), we have equivalently obtained tests 
for U and V. Some example properties where we can use the Choi—Jamiotkowski isomorphism 
include: equality of U and V, and U being a product operator, etc. 
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One may also find other tests for (Pauli/Clifford) group membership [190, 193] and com- 
mutativity [194] interesting. 


5.3.3 PROOFS VIA FORMAL VERIFICATION 


Some progress has also been made in applying formal methods to verify quantum computa- 
tions. Quantum programs are typically implemented with well-defined semantics. We can usu- 
ally deduct the behavior of quantum circuits directly from their descriptions. QWire [195] uses 
Coq [196] to verify some properties of simple quantum circuits, but classical computation for the 
theorem prover scales exponentially with the number of qubits. Feynman-path sum technique 
has recently been used to efficiently test for circuit equivalence [197]. Once again, the key chal- 
lenge is to define useful correctness properties that a theorem prover can handle more scalably. 
Quantum Hoare logic [198, 199], implemented in Isabelle, is a program logic that simplifies the 
full verification of programs. Some tools are designed to verify specific sub-classes of quantum 
circuits. For instance, relational quantum Hoare logic [200] is developed for security protocols. 
ReVerC [201] targets reversible circuits and is verified with the proof assistant F*. 


5.4 SUMMARY AND OUTLOOK 


Some may hope that the “right” quantum programming language will facilitate the development 
of many more novel quantum algorithms, but quantum algorithms have thus far been developed 
with pen and paper using mathematical techniques. More realistically, quantum programming 
languages are essential in converting theoretical descriptions of algorithms to practical imple- 
mentations that are both correct, efficient, and adapted for specific applications. This process 
can take months of effort for each algorithm and application. 

Quantum programming languages are the first set of abstractions that we have approached 
in this book, and they help set the framework for optimization and verification. In subsequent 
chapters we shall see that these abstractions are still fluid as quantum computer systems develop 
and we explore which cross-layer optimizations are critical for efficiency. 


Further Reading 

The chapter has mostly discussed quantum programming languages developed around the cir- 
cuit model of quantum computation. These languages represents quantum circuits at multiple 
levels of abstraction, and offer powerful tools for circuit optimization. Among them, some have 
device-driven designs, including OpenQASM [163] and Quil [172]; others have algorithm- 
driven designs, such as Q# [169]. 

In the case of functional programming languages, the lambda calculus model for quantum 
computation [202, 203] serves as their theoretical foundation. In particular, lambda calculus has 
motivated a series of development in type systems for quantum computation, which result in 
proofs on quantum data and quantum functions [166, 195]. 
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Some recent developments in quantum programming languages have escaped the notion 
of quantum circuit; rather, they model a quantum process by undirected graphs, such as the 
ZX-diagrams. These graphical models lead to languanges such as ZX-calculus [204, 205] and 
ZW-calculus [206]. The ZX-diagrams are closely related to the tensor network representation 
and undirected graphical models introduced in Chapter 9; we encourage the interested reader 
to make the connection. 


91 


CHAPTER B 


Circuit Synthesis and 
Compilation 


Practical quantum computation may be achievable in the next few years, but applications will 
need to be error-tolerant and make the best use of a relatively small number of quantum bits and 
operations. Compilation tools will play a critical role in achieving these goals. The job of a quan- 
tum compiler is to translate a quantum program written in a high-level programming language 
into native instructions recognizable by the hardware, through a series of transformations and 
optimizations. Traditional wisdom from compilation for classical computers can occasionally be 
inherited or adapted to the quantum case, such as resolving control flows and allocating registers. 
This chapter puts particular emphasis on the aspects of compilation that are unique to quantum 
computers. Notably, compilation under strict resource constraints has proven challenging, and 
optimization will have to break traditional abstractions and be customized to algorithmic and 
device characteristics in a manner never before seen in classical computing. We call attention to 
a number of important steps specialized for quantum compilation to help ensure the efficiency 
and correctness needed. To name a few, unitary synthesis focuses on exactly or approximately 
expressing arbitrary unitary transformations (such as single qubit rotations by an arbitrary angle) 
in a sequence of elementary gates. The goal of gate scheduling is to utilize commutation relations 
to determine the ordering of the (possibly parallel) operations, and to use circuit equivalence to 
simplify quantum programs. Qubit mapping is another challenge, in that we aim to strategically 
assign the variables in a quantum program to the qubits available in the system, under multiple 
constraints such as limited connectivity between qubits, fluctuations in the reliability of qubits 
and links, and potential opportunities for reclamation and reuse of qubits, etc. 


6.1 SYNTHESIZING QUANTUM CIRCUITS 


This section aims to address one essential question in quantum compiling, namely how to (ef- 
ficiently) implement some arbitrary unitary transformation using a given finite set of realizable 
quantum gates (i.e., primitive instructions). The complexity of the problem varies, depending on 
the objectives and assumptions. As a result, efficiency and optimality of the solutions varies. So it 
is important to recognize the different situations being considered in the community, and cate- 
gorize the known synthesis techniques into classes accordingly. Some example types of synthesis 
techniques considered in the rest of the section can be summarized as follows: 
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* choice of universal instruction set; 
* single-qubit, multi-qubit, and qudit (i.e., d-level quantum logic) synthesis; and 
* exact and approximate synthesis. 


The existence of an efficient synthesis of quantum circuits is largely determined by the 
choice of instruction set; one can imagine some quantum gates to be more "powerful" than 
others. Here powerful, which will be defined in subsequent sections, can be informally thought 
of as being able to cover the entire space of possible unitary gates more quickly. Furthermore, 
synthesis for multi-qubit unitaries or qudit unitaries is believed to be more difficult in general due 
to the high dimensionality involved; a general strategy is to decompose the high-dimensional 
unitary matrices into pieces of one- or two-qubit unitary matrices, for which efficient synthesis 
methods are known. Last, strategies for exactly or approximately synthesizing quantum circuits 
differ significantly, and consequently, they can have drastically different complexity. 

For example, if a quantum program is to be executed on a superconducting NISQ com- 
puter (without quantum error correction), then the instruction set would likely consist of single- 
qubit rotation (R(0)) gates and two-qubit cross-resonance (CR) gate, for they are easier to im- 
plement to high precision. Consequently, the target transformation U of the quantum program 
is synthesized, exactly or approximately (depending on the precision tolerance). The synthesis 
procedure typically involves first decomposing the multi-qubit unitary U into sequence of single- 
qubit unitaries and two-qubit CR gates, and then decomposing the single-qubit unitaries into 
Pauli rotation gates. 'Ihe goal would be to synthesize the most efficient circuit (e.g., short in 
depth and small in number of qubits) to some high precision required by the target computer. 

We have just demonstrated a typical example in circuit synthesis; the rest of the section 
focuses on illustrating systematically how to realize arbitrary quantum circuits under various 
conditions. 


6.1.1 CHOICE OF UNIVERSAL INSTRUCTION SET 


The first step in synthesizing quantum circuits is to choose a computationally universal quantum 
instruction set. What does it mean for a quantum instruction set S to be universal? 


Definition 6.1 A quantum instruction set S is called computationally universal if and only if 
gates from S allow the realization of an arbitrary quantum circuit C. 


One might worry that such realization could consist of a large number of gates from S. 
Fortunately, efficient universality results can obtained for some instruction sets, which is the 
focus of the section. 

One way of demonstrating that a given instruction set is universal is by providing a con- 
structive algorithm for exactly decomposing an arbitrary unitary transformation into a product 
of gates from the instruction set. It is convenient, and sometimes equally valuable, to relax the 
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constraint on exact synthesis, and show a product of gates from the instruction set that is very 
close to the target unitary transformation, i.e., a universal realization up to some precision. In the 
following, we show that one- and two-qubit unitary gates are universal by exactly synthesizing 
an arbitrary n-qubit unitary transformation. 

An arbitrary single-qubit unitary transformation can be mathematically represented as a 


2 x 2 matrix: 
e'* cos(0) e? sin(0) 
(Ss sin(@) —e? 25] 
which is parameterized by continuous variables o, 6, 0. More generally, the set of all possible 
n-qubit gates is the determinant-1 unitary transformations on a 2"-dimensional vector space, 
denoted by the group SU(2"). In this notation, one can rewrite the definition of computational 
universality as follows. 


Theorem 6.2 One-qubit and two-qubit unitary gates are universal; an arbitrary n-qubit unitary 
transformation U in SU(d), where d = 2", can be represented as a product of O(2?") matrices of the 
block form: 


i; 0 0 
MmiVy=]0 V 0 |, 
0 O0 Ig-i 


where V € SU(2) isa2 x 2 matrix, and Iy is ak x k Identity matrix. 


Each D; (V) can be thought of as a two-qubit gate on the ith and the (i + 1)th qubits, 
while keeping the rest untouched. We explain how a target unitary transformation acting on 
n qubits is decomposed into a product of two-qubit unitaries. In this construction, we find a 
sequence of suitable matrices W1, W2,..., Wan such that 


Wan --- WW, = U. 
It is equivalent to write 
Wan ---W3WíoU | = I. 


The key in this construction is that each W;, for j € [2"], transform the jth column of U^! 
to the jth column of I. W; is accomplished by a product of l';(V), for i € [2"], where each 
T;(V) transforms the ith and (i + 1)th qubits. Since I'; (V) only operates on the subspace of 
two qubits, it is sufficient to see that there exists matrix V such that 


y (2) — ( VIaE TP 
b 0 
for any number a, b € C. Indeed, we can write down such matrix: 
a* b* 


V= lale 1b? |a|2+|b|2 
—a 
Jlal2+lb2 — Ala -- Ib? 
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This is therefore an algorithm for exac//y decomposing an arbitrary unitary U into a product of 
O(2” - 2”) matrix of the form of T;(V), each of which acts only on a two-qubit subspace. It is 
also important to note that, although this is an exact synthesis method, the precision of complex 
numbers 6 will impact the final synthesis precision, which adds a poly(log(1/5)) factor to the 
final algorithm complexity. 

It is also important to note that two-qubit unitary gates are necessary for universal quan- 
tum computation, as single-qubit gates alone cannot produce the entanglement between qubits 
required by almost all interest quantum algorithms. 

Theorem 6.2 has significant practical implications. In principle, a universal quantum com- 
puter can be realized by implementing one- and two-qubit unitary gates to high precision. In 
practice, directly implementing a unitary transformation U € SU(2") would require exponen- 
tially complex control mechanism for all the parameters in the large O(22") matrix, while unitary 
synthesis techniques allow one to realize an equivalent transformation using only control mech- 
anism that acts on only one or two qubits at a time. 

In particular, for experimentalists, much engineering efforts have been put into physically 
implementing single-qubit gates and CNOT gates on qubits. This is because, as noted in Chap- 
ter 2, single-qubit gates and CNOT gate are universal. The proof is omitted here, but can be 
found in [86]. The key idea is to show that single-qubit gates and CNOT gate can be used to 
make up all two-qubit unitaries. Thus, following Theorem 6.2, we can show that any quantum 
circuit made of n-qubit gates can be exactly simulated by single-qubit gates and CNOT gates, 
with only a linear increase in the number of gates. 

From an algorithmic/number-theoretic perspective, choosing a universal instruction set 
with the most convenient structure is preferred. As we will see in the following section, highly 
structured instruction sets yield fast and efficient synthesis algorithms. Some examples of struc- 
tured instruction sets [207—211] include the Clifford-T set, the Clifford-cyclotomic set, and the 
V-basis set, etc. 


6.1.2 EXACT SYNTHESIS 


Following the previous section where we showed a constructive algorithm for exactly synthesiz- 
ing an arbitrary unitary for SU (d), we continue to provide some exact synthesis examples. The 
hope is that with prior knowledge in the structures of a target unitary transformation, we can 
find a more efficient circuit implementation. Indeed, much progress has been made in special- 
ized synthesis algorithm, we focus our discussions on some commonly used transformations, 
such as controlled unitary gates A&(U) (where U is performed on the target qubit(s) condi- 
tioned on the state of the k controlled qubit(s)), quantum oracle gates Oy (where a classical 
reversible function f is computed on the quantum state), and rotation gates Ry(@) (where a 
0 angle is rotated about an axis o in the Hilbert space of the quantum state). Last, we briefly 
demonstrate exact synthesis to Clifford+T gates, as an example technique where the structure 
of the instruction set is relatively well understood. 
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Figure 6.1: A generic decomposition for controlled unitary gate (A(U)). Note that P is the 
phase shift gate P = |0) (0| + &/* |1) (1]. 


It is important to note that Clifford+T synthesis originates from compilations for fault- 
tolerant (FT) machines, where Clifford operations are easy to realize using the stabilizer for- 
malism (see details in Chapter 8). An exciting open direction is to define, for NISQ computers, 
an instruction set with rich structures and efficient synthesis algorithms. 


Synthesizing Controlled-Unitary 
Recall that a controlled unitary gate (namely A (U) or c-U) acts on 1 + t qubits, 1 of which is 
the controlled qubit and t of which are the target qubits: 


la) eU |v) ifia) 1). 
la) e|v) — ifla) = |0). 

More generally, a multi-controlled-unitary gate acts on (k + t) qubits, k of which are the 
controlled qubits and ¢ of which are the target qubits: 


Ak(U) qi: qx) 8 IV) = |i: ak) @ UT 75 |y) 
Idi: qk) Q U |v) if |qgis qx) =[1---1), 
Idi: qx) ® |W) otherwise, 


ACU) |q1) 8 |W) = |q1) & U” |y) = 





where q1::: gx in the exponent of U is the product of the bits q1,..., qx. 
The general strategy to synthesizing controlled unitary gates (A(U)) is to rewrite unitary 
U in a product form: 
U = e? Ag By... As B2Ai Bı 


satisfying Aj: A541 = I and Bj are chosen such that A(B;) are easier to implement than 
A(U) itself, e.g., B; = X so that A(X) =CNOT gate. Here o is a phase factor, and m is typi- 
cally to be minimized. The key idea is that now the controlled unitary A(U) is decomposed as 
A(U) = A(e*)(1 & Am)A(Bm)... (I ® Az) A(B2)(I ® A1) A (By), that is a sequence of inter- 
leaved non-controlled unitary gates and controlled unitary gates. Note that A (e/*) denotes the 


controlled phase shift gate: A( (^ m = G 2 


elt 
in Figure 6.1. 
It has been shown that an arbitrary unitary U can be decomposed into U = e'*CXBXA, 
where X is the Not gate and CBA = I [86]. Asa result, A(U) is implemented using single-qubit 


) & I. It can be visualized as the circuit 
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Figure 6.2: An quantum circuit implementation of the reversible Toffoli gate as a sequence of 
single- and two-qubit gates. 


gates and CNOT gates. Since the choice of A, B, C is non-unique, one often needs to further 
optimize the synthesis of A(U) by intelligently choosing a decomposition of U. 

Similar logic applies to multi-controlled unitary gate (A (U)) [86]. An important exam- 
ple is the Toffoli gate. Figure 6.2 shows a synthesized Toffoli gate using Hadamard gates, T 
gates (and their inverses), and CNOT gates. 


Synthesizing Quantum Oracles 

Quantum oracles are essential components in quantum query algorithms; they allow one to 
compute classical Boolean functions on quantum states. Implementations of quantum oracles 
have been largely under-studied, partially because a traditional analysis of the query complexity 
of an quantum algorithm typically assumes that oracles are given by another party, but aims 
to bound the number of accesses made to the oracles. However, in practice, oracles need to be 
implemented just like the rest of the algorithm. The cost of the oracles eventually contributes to 
the time cost of the overall algorithms. As such, this section is devoted to general strategies for 
synthesizing quantum oracles. 

Do all classical Boolean functions have quantum circuit implementations? For those that 
do, can we efficiently synthesize them? The basic principles introduced in Chapter 2 give rise 
to the potential computing power that quantum computers possess, but at the same time, they 
impose strict constraints on what we can do in quantum computation. For example, the trans- 
formation rule implies that any quantum logic gate we apply to a qubit has to be reversible. The 
classical AND gate in Figure 6.3 is mot reversible because we cannot recover the two input bits 
based solely on one output bit. To make it reversible, we could introduce a scratch bit, called 
ancilla, to store the result out-of-place, as in a controlled-controlled-NOT gate (or Toffoli gate) 
in Figure 6.3. As the arithmetic complexity scales up when tackling difficult computational 
problems, we quickly see extensive usage of ancilla bits in our circuits due to this reversibility 
constraint. 

We first demonstrate an example where the oracle is simple to synthesize, namely, the 
phase oracle for the Berstein-Vazirani algorithm from Chapter 3; ultimately, we will show that 
reversible logic synthesis tools are necessary for more complex functions. 
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Figure 6.3: Circuit diagram for the irreversible AND gate and the reversible Toffoli gate. 


Recall that, in the Berstein-Vazirani algorithm, a phase oracle implements a Boolean 
function f : (0, 1)" — (0, 1}, which encodes a secret string s € (0, 1j" as follows: 


n 
fü eam ws mod 2. 


i=1 





When applied to a quantum state |y), the oracle O7 accumulates an phase on |y} depending on 
the output of f. Without loss of generality, we consider a basis of |Y}, denoted as |qiqz2... qn). 





OF |qiq2 «+ -Gn) = (71) 19299 iga ...q,). 





The key in synthesizing an oracle OF for a particular secret string s € (0, 1)" is by a technique 
called phase kickback. 


Phase Kickback 


In Chapter 2 we demonstrated that a CNOT gate flips the state of the target bit, condi- 
tioned on the control bit. In the following, we will see that the control bit is sometimes 
affected by the CNOT gate as well. For example, we have the following phase kickback cir- 


cuit: 





RUE a |0) — £ |1) 
=) =) 


WD 


Normally the state of the control bit does not change after a CNOT gate. Here if the target 
bit is in the |—) state, as the name of the circuit suggested, a phase (in this case, a minus 
sign) is "kicked" onto the control bit. 
a 
B 
ale 
>= 5 (0) = |i) = E ) So the 2-qubit system has a joint state: 


The control qubit is in arbitrary state a|0) + 8|1) = ( ) and the target qubit is 


42 
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Remarkably, by performing the CNOT gate, we have essentially changed the state 
of the control qubit, i.e., adding a phase to it. Therefore, the CNOT gate accomplishes a 


phase kickback. 





Now with phase kickback, we remark that synthesizing O7 becomes straight-forward: 
for every bit b; in string s, if b; = 1, we add a CNOT gate between the |g;) and the bottom 


qubit |—), as shown in Figure 6.4. 


More generally, a typical synthesizer follows a two-step procedure to generate the suitable 
quantum program that implements the desired reversible arithmetic. 


1. Find an efficient implementation of the desired function using reversible logic gates. 


2. Implement each reversible logic gate using quantum gate(s). For example, a Toffoli gate 
can be implemented with a sequence of single-qubit gates and a few CNOT gates, as shown 


in Figure 6.2. 
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For small arithmetic logic, algorithms exist to directly synthesize reversible circuits from 
the truth tables of the desired function. This typically works well for small low-level combi- 
national functions, but not for functions with internal states [212]. As the complexity of the 
arithmetic in an algorithm scales up, modularity quickly becomes convenient, and in many cases 
necessary. That is, to construct high-level arithmetic, we need to start with all modular subrou- 
tines. 

In reversible logic synthesis and optimization, besides making our circuit for the reversible 
function contain as few gates as possible, we would also like to minimize the amount of scratch 
memory (i.e., number of ancilla bits) used in the circuit. Fortunately, there is a way to recycle 
ancilla bits for later reuse. For a circuit that makes extensive use of scratch memory, managing 
the allocation and reclamation of the ancilla bits becomes critical to producing an efficient im- 
plementation of the function. The technique is called “uncomputation,” formalized by Charles 
Bennett [213]. Details on uncomputation are deferred to Section 6.4.3. 

To emphasize the importance of reversible logic synthesis, we remark that reversible arith- 
metic plays a pivotal role in many known quantum algorithms. The advantage of quantum algo- 
rithms is thought to stem from their ability to pass a superposition of inputs into a classical 
function at once, whereas a classical algorithm can only evaluate the function on single in- 
put at a time. Many quantum algorithms involve computing classical functions, which must 
be embedded in the form of reversible arithmetic subroutines in quantum circuits. For exam- 
ple, Shor’s factoring algorithm [12] uses classical modular-exponentiation arithmetic, Grover's 
searching algorithm [13] also implements its underlying search problem as an oracle subroutine, 
and the HHL algorithm for solving a linear system of equations contains an expensive reciprocal 
step [214]. These reversible arithmetic subroutines are typically the most resource-demanding 
computational components of the entire quantum circuit. 


Synthesis Over Structured Instruction Sets 
When a target instruction set has well-understood, nice structures, exact synthesis of quan- 
tum circuits over such instruction set can be efficiently (and sometimes optimally) imple- 
mented. Indeed, in this section, we explain how to exploit such structures from a algebraic and 
number-theoretic perspective. In particular, we examine single-qubit synthesis algorithms over 
the Clifford-T set. 

Recall that the single-qubit Pauli gates are denoted as 


0 1 
x= (1 a) 


Definition 6.3 The Pauli group P contains all quantum gates generated by (X, Y, Z, i), where 
X, Y, Z are the Pauli gates and i is the imaginary unit. 
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So any gates that can be written as a product of the Pauli group generators belong to the 
Pauli group, e.g., —/ = i?X?. 


Definition 6.4 The single-qubit C/ifford group C contains all single-qubit quantum gates gen- 
erated by (H, S, œ), where H = 7 (i |) is the Hadamard gate, S = l `) is the Phase 
gate, and w = e!7/^, 

Although our focus in this section is on single-qubit synthesis, we note 
that, for completeness, the two-qubit Clifford group can be generated by 
(H @1,1@H,S @1,1 ®S,CNOT,w). For example, controlled-phase (CZ) gate is one 
of the Clifford gates, as CZ = (1 & H)CNOT(I & H). 

The Clifford gates are of particular interests for fault-tolerant quantum computers, as they 
are well-suited for many quantum error correction codes built upon the stabilizer formalism 
(please see Chapter 8 for details). 

By adding the T gate, T — l 
to say, for single-qubit circuits, (H, T, œ) is universal. Throughout this book, we might refer to 
such instruction set as the Clifford-T set. 

Exact Clifford-T synthesis was first proposed in [215] and studied in [207, 208, 210, 216]. 
Many of the synthesis tools rely on the observation that an arbitrary (single-qubit) quantum 
circuit can be written in some normal form. Here we follow the analysis called the Matsumoto— 
Amano normal form. 

Recall from Section 2.2, we define the Bloch sphere representation of quantum states: 


0 ; ; : ; : 
ix/4 |» We arrive at a universal instruction set. That is 
e 


X 
p-|v)(y|-2 iü-cxX-yY +2Z) = e) 


Z 


Definition 6.5 The Bloch sphere representation, denoted as U € SO(3), of unitary gate U is 
defined as a linear operator 
x ae 
Uly}= (: ; 
Z z 


satisfying U(x X + yY + zZ)UŤ = x'X + y'Y +7'Z. 


For example, the Hadamard gate H can be written in this form as: 


0 0 1 
H=|0 -1 O}. 
1 0 0 
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By writing down the Bloch sphere representations for the Clifford-T generators (H, S, T), 


we observe that the entries in an arbitrary U are in the ring 


1 2 b 
(eee ey ae 
J/2 


where Z and N denote the sets of integers and natural numbers, respectively. The proof of the 
statement can be found in [216]. In fact, if U is a single-qubit Clifford-T unitary, then its unitary 
matrix entries are in the ring Ziz i]. 





Z| 


The Matsumoto—Amano normal form construction goes as follows—we start with any 
single-qubit Clifford-T unitary U, written in a generic form: 


U = C,T --- C3T CAT Co, 


where Cj is in the Clifford group, for i € (0, 1,..., 71). The objective is to simplify the above 
expression. Upon inspection of the Bloch sphere representation of the generators in the Clifford- 
T set, one can show that U can be uniquely rewritten in the form of 


U = My: Mo MiCo, 


where k > 0, and M; € (T, HT, SHT} fori € [k]. Now we want to determine the M,’s iteratively. 
For convenience, we write out the Bloch sphere representation of the matrices of interests: 


Uig  MU12 M13 0 —1 0 1 1 -1 0 
U= U21 U22 M253]. S= 1 0 0 ; T= V2 1 1 0 
U3, U32 U33 0 0 1 2 0 0 42 
where the entries of U is defined as uj; = Lau, 
2 


The key observation is that the denominator of the entry in U, namely the m. can 
be obtained from a product of T operators, interleaved with Clifford operators. Every time 
we multiply the inverse of 7 on the left of U, we reduce k by one. Specifically, we want 
M, M53! --- M; !U to arrive at a Clifford operator Co € C. There exists a constant-time al- 
gorithm for finding a suitable M; for each iteration. [207] shows that M; € (T. HT, SHT} for all 
i € [k]. One can also prove that such normal form is unique [216]. Hence, this algorithm finds 
a Clifford-T circuit with optimal number of T gates. 

Besides the Matsumoto—Amano normal form construction, there exists other synthesis 
tools that exploit the structure of Clifford-T circuits [208, 211, 217]. Many techniques have 
been developed for other similar instruction sets, such as the Clifford-cyclotomic set [209], the 
V-basis set [210, 218], and the Clifford-CS set [219]. 

So far, we have been discussing specialized synthesis tools over instruction sets that have 
nice algebraic structures; little is known on exact synthesis over some physically motivated in- 
struction sets for the NISQ era. For instance, unlike a FT machine with finite set of instructions 
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(e.g., Clifford+T gates), a NISQ machine typically supports arbitrary-angle rotations imple- 
mented via analog physical pulses with high precision, hence the single-qubit rotation gates 
need not be synthesized. It remains an open problem in finding efficient synthesis algorithms 
that are tied to realistic physical platforms, e.g., a noise-aware synthesizer. 


6.1.3 APPROXIMATE SYNTHESIS 


Suppose the unitary operator U represent the target transformation, and V is the unitary oper- 
ator that is actually implemented using gates from the given set G. V is an approximate imple- 
mentation of the transformation U using the gate set G. 


Approximation Metrics 


The distance of the approximate implementation V to the ideal one U is defined as: 


D(U, V) = nuu —V) |v) Il, 


where the supremum is over all possible pure quantum states |y), and ||x|| = /x*x is the 
2-norm of vector x. The intuition behind this definition is that if D(U, V) is small, then 
it is roughly equivalent to saying that the measurement outcomes of U |y) have the same 
statistics as those of V |y) for any initial state |Y}. Similarly, the operator norm is defined 
by: 
IU |W) II 
[i EN rare 
wzo Hv) II 

It can be shown that the above distance measure is equivalent to the ¢race distance of 

the two unitary operators (up to normalization), which is easier to compute: 


1 
Dy(U.V) = SIU — Vh. 


where ||X ||; = tr(V X* X) is the 1-norm of matrix X. 

The unitary V is said to be an €-approximation of the unitary operator U if D(U, V) < 
e. In the context of unitary synthesis, V is sometimes written as a sequence of instructions 
£1:/92--5-»9m ttomithe gate set G, that is, V = ense e521. 
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Errors Accumulate Linearly 
A simple fact that enables many efficient approximate synthesis algorithms is the subadditivity 
of errors [9]. 


Lemma6.6  IfU;,..., U, and Vi,. ... V; are operators such that ||Ui|| € 1, ||Vi\| € 1, and ||U; — 
Vi|| < 6, for alli € [t], then 


||U; -+ -U2U1 — V; +++ VoVA|| < tê. 


This subadditivity property is the basis of many approximation algorithms. 


Approximating Arbitrary Unitaries 
Approximating an arbitrary unitary operator is generally very hard. Fortunately, any single-qubit 
gate can be approximated to arbitrary accuracy e using generally (1/6) gates from a finite gate 
set. A typical gate set of choice is the Clifford+T set, consisting of H gate, S gate, and T gate. 
The Solovay-Kitaev (SK) algorithm shows that we can do much better—any single-qubit 
gate can be approximated to arbitrary accuracy eo using only O(log‘ (1/€o)) gates. The key idea 
for approximating a unitary U to arbitrary precision is to construct a finer and finer e-net (i.e., 
a very small volume ball in SU(d)) around the identity using group commutator VWV'W1 re- 
peatedly. Formally, the e-ball can be defined as: Be = {U € SU(d) : ||U — 1|| < ej. Then the 
Solovay-Kitaev theorem states the following. 


Theorem 6.7 For any two universal instruction sets S, T C SU(2") (closed under inverses, i.e., 
for any W in S or T, W-! is also S or T) on n qubits, and a precision number 5 > 0, if a uni- 
tary transformation A consists of L instructions Ui,..., Ur € S, then A can also be implemented 
with precision Ó using M = O(L(log(L/8)*)) instructions Vi,..., Vu ET such that such that 
||Ur <+- Ui — Vm ++: Vi || € à. (Here, c is between 3 and 4, and there is a classical algorithm for find- 
ing such circuit V; € T in time O(L(log(L/6)°)). 


We now describe the SK algorithm for synthesizing a single-qubit gate U to arbitrary 
accuracy € with a polylogarithmic running time in 1/e (Algorithm 6.1). 

Algorithm 6.1 [220] is a recursive algorithm for approximating a unitary operator U with 
accuracy €,, where accuracy is better for larger n. Line 2 is the base case where a basic ap- 
proximation to U is found by building a lookup table of length-£ sequences of gates, among 
which we pick the closest to U. In line 5, we find the appropriate group commutator such that 
UU! = VWV'W'*. With this algorithm, it can be shown that accuracy improves as the recursive 
depth increases, that is €n < €,-1 < +++ < €o. Details of the algorithm are omitted here; we refer 
the interested reader to the tutorial in [220]. 

Animprovement to the base case search procedure has been developed by Amy et al. [221]. 
The “meet-in-the-middle” algorithm finds the minimal depth circuit with improved running 
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Algorithm 6.1 Solovay—Kitaev (SK) algorithm for synthesizing single-qubit gates 








Function SolovayKitaev(single-qubit gate U, accuracy level n) 
1: if n == 0 then 
2: return Closest approximation to U 
3: else 
4: U < SolovayKitaev(U, n — 1) 
5 V, W <- Decompose UU] , into a balanced group commutator 
6  Vy—-1 < SolovayKitaev(V, n — 1) 
7: Wa < SolovayKitaev(W, n — 1) 
& return Un — V, ,W, V!WTU, 4 
9: end if 








time. The best known value for c is close to 2. It is still an exciting open problem to determine 
the best value for c (i.e., the shortest output sequence), possibly somewhere between 1 and 2. 


Optimal Approximate Synthesis 
We can bound the number of gates required for synthesizing (approximately) an arbitrary unitary 
transformation by a simple vo/ume counting argument. 

Recall in the Solovay-Kitaev theorem, we define the e-ball in SU(d) around a given 
unitary V € SU(d) as Be = {U € SU(d) : ||U — V|| < €}. It is understood that SU(d) is a 
d? — | dimensional manifold, so the volume of the ball is proportional to €? ?' Hence, it takes 
O((1/ ©) different strings of gates to represent the elements in SU(d) to precision e. There- 
fore, no algorithm can achieve synthesis of unitary in SU(d) using fewer than log(1/e) gates. 

A natural question to ask is: can some instruction sets saturate this lower bound? In other 
words, does there exist a instruction set S such that one can synthesize an arbitrary unitary in 
SU (d) using O(log(1/e)) gates? Fortunately, the answer is positive; at least for some universal 
instruction sets, such optimal synthesis algorithm exists. Harrow, Recht, and Chuang [222] 
showed a sufficient condition on S that allows for efficient universality. 


6.1.4 HIGHER-DIMENSIONAL SYNTHESIS 


Instead of using binary logic to target two-level qubits, compilers can target n-ary logic com- 
posed of qudits (i.e., d-level quantum systems). So far, in this book, we have been focusing on 
qubits alone; generalizations to high-dimensional logic has proven usefulness in circuit synthesis. 
In particular, quantum computations use a lot of temporary qubits (called ancilla). Ancilla bits 
are particularly necessary when performing arithmetic, since all quantum computations must 
be reversible (in order to conserve energy and avoid collapsing the quantum system to a clas- 
sical state). Classical arithmetic can be computed with a universal NAND gate, but a NAND 
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has two input bits and one output bit. It is impossible to reverse a NAND, since one output 
bit is not enough information to fully specify the two input bit values. The smallest universal 
reversible logic gate, which can simulate a NAND, has three inputs and three outputs, but one 
input and two outputs end up being ancilla. Consequently, arithmetic in quantum computa- 
tions (which is very common), generates many ancilla. Qutrits (three-level systems) allow us to 
essentially generate ancilla by borrowing a third energy state in a quantum device to hold extra 
information. 

Qudit synthesis can asymptotically change the critical path on quantum circuits. A num- 
ber of recent algorithms [211, 223-226] have been developed for qutrit gate sets. In the NISQ_ 
era, the practical tradeoff, however, is that the third energy state generally is more susceptible 
to errors than the first two states used for binary qubits. Fortunately, noise models show that 
overall error can be decrease with qutrits. This is because computations require less time, and 
the third state is not always in use. 


6.2 CLASSICAL VS. QUANTUM COMPILER 
OPTIMIZATION 


Despite the unique properties of quantum circuits, we emphasize that quantum compiling is 
not completely different from classical compiling. As such, there are generally three classes of 
quantum compiling techniques: 


* classical optimizations under classical constraints; 
* classical optimizations under quantum constraints; and 
* quantum optimizations under quantum constraints. 


A quantum compiler must deal with control flow optimizations just like a classical com- 
piler. The most notable classical optimizations under classical constraints that have been applied 
to quantum compiling include procedures such as loop unrolling, procedure cloning, inter- 
procedural constant propagation, etc. The classical constraints, such as data dependencies, 
pipelining, and synchronizations, still apply in the context of quantum compiling. These op- 
timizations inherited from classical compilation can significantly reduce the cost of a quantum 
circuit [170]. 

Fortunately, we have an abundance of techniques to inherit from classical compiler 
optimizations—for instance, the pass-driven approach in the LLVM compiler framework was 
adopted for quantum compiler. 

IBM's Qiskit transpiler, as shown in Figure 6.5, is one of the examples of a pass-driven 
compilation framework. Multiple passes, each specialized in a different compilation task, are 
implemented in the framework. The user is free to chain together a subset of passes according 
to target programs needs. 
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Transpiler Passes 
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Figure 6.5: The compilation framework in Qiskit Terra developed by IBM [165]. 


Classical optimizations under quantum constraints play a crucial role in quantum compiling. 
This is where we need to adapt the familiar strategies in classical compilers to the unique quan- 
tum constrains. Communication cost of two-qubit interactions is one example of such quantum 
constraints. [161] has shown that routing strategies used in distributed systems can be bor- 
rowed after being adapted to the quantum communication cost. Register allocation algorithms 
also provide insights on how to allocate and reuse qubits in a quantum circuit [227]. Some 
other examples of such optimizations include mapping qubits to less noisy parts of a quantum 
machines [228-231], and scheduling parallel gates to avoid crosstalk between qubits [232], etc. 

Finally, quantum optimization under quantum constraints refers to using quantum tech- 
niques to optimize for quantum compiling. For example, quantum programs can be stored in 
qubits (a quantum analogue of a stored-program architecture) using gate teleportation [233- 
235]. In this model, for example, quantum circuit synthesis tools that exploits the algebraic 
structures in unitary gates (see Section 6.1), and scheduling tools using teleportation [236], etc. 
Another example in this spectrum is to use a quantum computer to perform quantum compil- 


ing [237]. 


63 GATE SCHEDULING AND PARALLELISM 


This section describes the impact of an important instruction-level optimization: gate schedul- 
ing. A schedule of a quantum program is a sequence of gate operations on logical qubits. The 
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sequence ordering defines data dependencies between gates, where a gate g2 depends on g if 
they share a logical qubit and g2 appears later in the schedule than gi. 
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A quantum circuit executes from left to right. A qubit can only be involved in one quantum 
gate at a time. Data dependencies determine the sequential and parallel execution ordering of 
the gates in a quantum circuit. For example, two sequential gates back to back in general have 
strict ordering constraints: 




















qi: UHV * VU 


























For two arbitrary unitary operators U and V, swapping the order of the two generally 
yields different results. One can quickly verify by writing down the matrix multiplications for 
two unitary matrices: 

U-VÆV-U. 

Although not always equivalent, unitary matrices can sometimes be reordered in special 
cases. When that happens, we say the two matrices commute with each other. We will elaborate 
on the commutation relations later in the section. 

Two parallel gates side by side in general has no ordering constraints: 














qi: U = U = U I = U 
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Temporal ordering of parallel gates does not matter because we can check the correspond- 
ing tensor products and verify: 


USV=(U@l)-I@V)=(1@V)-U@l). 


‘The impact of gate scheduling can be quite significant in quantum circuits, and many algo- 
rithm implementations rely upon the execution of gates in parallel in order to achieve substantial 
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Figure 6.6: A data dependency graph for a quantum circuit with 8 gates: g1..... gg. 


algorithmic speedup. Gate scheduling in quantum algorithms differs from classical instruction 
scheduling, as gate commutativity introduces another degree of freedom for schedulers to con- 
sider. Compared to the field of classical instruction scheduling, quantum gate scheduling has 
been relatively understudied, with only few systematic approaches being proposed that incorpo- 
rate these new constraints. 


6.3.1 PRIMARY CONSTRAINTS IN SCHEDULING 


Data Dependency 

Some gates are required to be applied after the completion of previous gates. This requirement 
can come from sharing of qubits, or specifications in the quantum algorithm (such as barriers 
and timing constraints). The previous section has illustrated the sequential and parallel ordering 
of quantum gates. It is convenient to construct a data dependency graph (DDG) when analyzing 
a quantum circuit. A DDG G = (V, E) is defined as follows: each vertex v in V represents 
a quantum gate and a directed edge from v, to v5, that is (v1, v2) € E means the gate in v; 
depends on the gate in v2. Note that indirect dependencies are not drawn in the graph; if v3 
depends on v5 and v5 depends on vi, we do not draw an edge for the transitive dependence 
between v; and v3. For example, we can draw the DDG for a sample quantum circuit (omitting 
the measurements at the end), as shown in Figure 6.6. 

Besides obeying the data dependency constraints to ensure logical correctness like a clas- 
sical scheduler, a quantum scheduler must also respect other constraints such as the impact of 
noise while running a quantum circuit. In the following, we highlight a number of important 
considerations a quantum gate scheduler needs to take to ensure successful execution of a quan- 
tum circuit. 
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Figure 6.7: Left: the connectivity graph of a 5 x 6 2D mesh. Vertices are qubits, and two vertices 
are connected if the qubits are connected. Right: when the two center qubits are chosen for a 
two-qubit gate at an interaction frequency (highlighted in red), the surrounding qubits must be 
tuned off resonance from this interaction frequency, hence another two-qubit gate at the same 
frequency cannot be scheduled on any orange edges. 


Hardware Limitations 

To execute parallel instructions, a machine must be able to independently and simultaneously 
address and drive the individual qubits of interest. In NISQ computers, it usually comes with 
hardware constraints. For example, in a common 2D superconducting transmon architecture, 
each qubit is coupled at most with only its four nearest neighbors, which means that two-qubit 
gates can only be performed on adjacent qubits. Long-distance interactions are enabled by swap- 
ping qubits closer, hence inducing communication overhead. Furthermore, a NISQ machine typ- 
ically has limited parallelism support. Although each qubit is connected independently to its 
drive line(s), sending pulse signals simultaneously down to two neighboring qubits could result 
in crosstalk noise depending on the coupling strength between them. In some design, simulta- 
neous gates are prohibited if they are scheduled too close to each other, in order to prevent 
such crosstalk errors. Figure 6.7 illustrates the qubits that can be affected by crosstalk errors (to 
first order) if their frequencies were tuned improperly, assuming a frequency-tunable transmon 
architecture. 

A trapped-ion machine has unique scheduling constraints as well. In particular, the linear- 
trap tape-like architecture dictates that simultaneous gates can only be performed on contiguous 
qubits within a sliding window. This is due to the hardware limitation that qubits are controlled 
by laser beams of bounded width, typically smaller than the total number of qubits in a trap. One 
can only schedule a gate when (i) all previously dependent gates are scheduled and (ii) current 
laser beam window covers all operand qubit(s). 
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Commutation Relations 

In a few cases, the sequential ordering constraints can be relaxed. As suggested in the last section, 
this can be illustrated by commutation relations. More specifically, we introduce the following 
definitions. 


Definition 6.8 The commutator of two gates (operators) A, B is defined as 
[A, B] = AB — BA. 
If [A, B] = 0, then we say A and B commute. 


In the context of scheduling, the commutation relation means that the two gates A, B are 
free to be reordered: 








qı : AHB = BHA 






































Finding two commuting operators may not be easy. A tool generalized from the commu- 
tation relation can be very useful when scheduling two sequential operators, namely, finding the 
conjugate of an operator. 


Definition 6.9 Two gates A and B are conjugate, if there exists another gate U such that 
UAU! = B. 


A simple rewrite of the above equality results in the following: 
UAU' = B => UAU'U = BU => UA — BU. 


As shown in the circuit model, we have found a way of moving one gate U to the right 
through another gate A (with the caveat that now A becomes B): 














qi: UHA = BH U 
































It is sometimes useful to construct the data dependency graph that reflects the commu- 
tation relations. So the commutation-relaxed data dependency graph (CRDDG) [238] is defined 
by first identifying groups of mutually commuting gates in the circuits, removing the depen- 
dencies of gates within a commuting set, and deriving new dependencies of any gates between 
two different sets. Suppose the gates in the example circuit can be grouped into the following 
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Figure 6.9: The commutation-relaxed data dependency graph (CRDDG) of the sample quantum 
circuit in Figure 6.8. 


mutually commuting sets as shown in Figure 6.8. The corresponding CRDDG can be draw as 
in Figure 6.9. 

Notice that the dependencies between g1, g2, and g3 are removed due to the commutation 
relations. A less obvious additional dependency is from gı to gs. Because g5 would be directly 
adjacent to g1, we could swap the order of gı and g» if allowed by their commutation relation. 


Qubit Decoherence 

In a NISQ machine, qubits have limited lifetime due to spontaneous decoherence. It is com- 
monly referred to as idle noise on the qubits. A scheduler is responsible for maximize the uti- 
lization of the qubits within their lifetimes, that is from the moment they are initialized for 
computation until decoherence happens. Depending on the technologies, the number of gates 
safe to execute on a qubit before the chance of decoherence becomes too significant varies. 

As a result, a common goal in scheduling quantum circuit is to minimize the circuit depth, 
that is the longest critical path of the circuit. It is closely related to the total running time of the 
circuit, and is sometimes called the total /atency of the circuit. If the circuit depth is bounded 
within the lifetime of any qubits, then the circuit is generally safe from decoherence noise. 
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Another finer-grained measure that more precisely characterizes the utilization of qubits 
is the space-time volume. It accounts for the exact active usage (during which the qubits suffer 
from the risk of decoherence) of each qubit. Notice that when a qubit is in ground state |0), it 
is safe from decoherence. For instance, from the moment an ancilla qubit is freed (restored to 
ground state) until reused for computation, it stays safely in the ground state. For this reason, it 
does not contribute to the space-time volume. To accurately estimate the workload of a program, 
we define the space-time volume of a program as: 


v=, > i (te — ti), 


qEQ (ti.ty)eTa 


where Q is the set of all qubits in the system, and Ty is a sequence of pairs 
ITg|-1 p 1 

{t,t , te), (tł , t4), (tj * , s 

is we A p ud p as the allocdtion time and reclamation time of the kth time that qubit q is 


)}. Each pair corresponds to a qubit usage segment, that 


being used, respectively. V is high when a large number of qubits stay "live" (in-use) during the 
execution; thus, the higher the volume, the more costly it is to execute on that target machine. 

One drawback of this definition is that it still does not fully reflect the impact of noise in 
the system. For instance, making the gate-dependent errors more explicit in the characterization 
will correlate the metric better with the cost for successful execution of the program. Space-time 
volume makes one step in that direction, but it remains an open problem to find a more effective 
metric. 


6.3.2 SCHEDULING STRATEGIES 


Some common scheduling strategies include the following. 


* ALAP (As-Late-As-Possible) Scheduling. One of the simplest scheduling algorithms is 
the ALAP (As-Late-As-Possible) scheduler. In essence, it starts with the end of the 
circuit and schedules the last gates needed to be completed, and goes backward to their 
previous gates. The advantage of ALAP scheduler, as opposed to ASAP (as-soon-as- 
possible), is the qubits are initialized only when absolutely needed. Since qubits have 


limited lifetime, it is beneficial to initialize them as late as possible. This scheduer is 
implemented in Qiskit [165] by IBM. 


* LPF (Longest-Path-First) Scheduling. When programs have more complex control flow, 
prioritization is needed between different parallel modules. The LPF (longest-path- 
first) scheduler tries to avoid increasing the critical path of the program, thus reducing 
the circuit depth. [160, 239] are some example LPF schedulers. 


e Communication-Aware Scheduling. More advanced schedulers take into account costs 
of communication, due to two-qubit gates between operands that are far apart. Com- 
munication cost varies for different architectures, so does their scheduling strategies. 
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Here we highlight a general technique commonly used for reducing communication: 
barrier insertion. When a two-qubit gate is known to be causing high communication 
cost (such as long swap distance), separating it from other gates along the critical path 
can be effective. It is shown that iterative algorithms can benefit from the introduction 
of barriers as well, as inserting a barrier at the end of each round creates clean divi- 
sions between the rounds [161]. Details about communication cost can be found in 
Section 6.4 on qubit mapping and reuse. 


* Adaptive Scheduling. When there are multiple implementations of the same gate, it 
is possible to let the scheduler choose the best one based on gate time, qubit fidelity, 
gate noises, etc. Due to our limited understanding of the noise characteristics of NISQ_ 
machines, this strategy remains a challenge. 


Scheduling can be done at the logical level or the physical level. In NISQ machines, we 
just have the physical level as there is no error correction. While in error-corrected machines, we 
have both logical (fault-tolerant) and physical level. In the latter, it makes sense to apply these 
optimizations at both levels, that is to schedule the application program as well as to schedule 
the error correction routine [161, 240]. 

In the next section, we introduce another important compiler optimization step: qubit 
mapping and reuse. These optimizations are typically applied serially in a single pass in modern 
quantum compilers [165, 170], but it is not always clear which optimization should go first; 
sometimes it is necessary to redo the same optimization. The complexity goes up if one goes 
back to redo optimizations so as to obtain a better-optimized code. 


6.3.3 HIGHLIGHT: GATE TELEPORTATION 


In this section, we highlight a remarkable phenomenon called “quantum teleportation,” and 
its important applications in gate scheduling, namely a strategy called “gate teleportation,” in 
which scheduling gates is equivalent to scheduling resource states. We start by writing down an 
important resource state, the EPR pair |y) = m ae 


) which can be produced by the following 
circuit: 
now a (100) + 111) 
0) Ens i 


Here we introduce the fundamentals behind a teleportation-based quantum computer 
(QC) architecture, across which such EPR pairs are utilized as resource states. Because both 
quantum states and quantum gates can be transferred over long distance via a teleportation 
circuit, this technique is particularly useful in distributed QC architectures or in reducing com- 
munication cost in general. 

Consider scheduling a remote two-qubit gate between two qubits that are too far to apply the 
joint gate pulse directly. In a non-teleportation-based QC architecture, one has to move the pair 
of qubits closer via physical movement or a series of swap gates, so as to perform the two-qubit 
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gate. With teleportation, we have two more options, namely to move a quantum state to be 
closer to the other by teleporting the state, or to perform remote two-qubit gate by teleporting 
the gate. 


Teleporting States 

Suppose we have an arbitrary quantum state |) and the EPR pair, the following circuit first 
destroys the state |y} via Bell-basis measurement, and subsequently restore |y) on one of the 
qubit in the EPR pair. 























lo — i d 
(0) —H y HZI = |W) 

Notice that the boxed area is a EPR pair that can be prepared in advance. Now suppose 
the top two qubits are measured with outcomes x, z € {0, 1}, respectively. Before the conditional 
X and conditional Z are applied, the bottom qubit is in a quantum state |¢) = X*Z* |y). 
Therefore, with appropriate recovery operations X and Z, we arrive at |y} in the bottom qubit. 

This is called a “teleportation circuit,” because one can interpret this procedure as follows: 
Alice holds the top two qubits (|Y) and one of the qubits in the EPR pair prepared in advance), 
while Bob holds the bottom qubit. She wants to send him the quantum state |y}. She can do so 


by performing the Bell measurements on her qubits, and send Bob two classical bits (e.g., over 
the phone). Upon applying the recovery gates X and Z, Bob can translate his qubit to |). 






























































Teleporting Two-Qubit Gates 

Now consider an important application called "gate teleportation." It has been shown [233] that 
for certain unitary gate U, one can use the above circuit, to efficiently teleport a state "through" U 
so that the implementation cost and the communication cost for U |y} are significantly reduced. 
Specifically, we demonstrate the usefulness of a remote CNOT gate via teleportation. To begin 
with, we draw the following circuit: 
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As a result, the two output qubits are CNOT(|W1) , |V2)). Notice that the resource state 
(100) + |11)) |00) /2 + (101) + |10)) |11) /2 can be prepared in advance, because it does not de- 
pend on the input qubits | V1) and |y). The rest of the circuit can be cut into two separate parts, 
i.e., first part consisting of the top three qubits and the second part consisting of the bottom 
three qubits. Notice that the two parts can be arbitrarily far apart. Hence, using this circuit, we 
have accomplished a remote CNOT gate, saving the cost of communication between the two 
qubits. We remark that the communication cost is essentially replaced by the preparation cost 
of the resource state; however, the resource state can be prepared offline (i.e., at compile time 
and prior to the execution of the circuit) and distributed across the systems in advance. 




















6.4  QUBITMAPPING AND REUSE 


Two convenient tools in analyzing qubit mapping are the device connectivity graph and the pro- 
gram interaction graph. A device connectivity graph, where each node is a qubit and two qubits 
are connected if the two qubits are allowed to directly interact. For example, in a superconduct- 
ing device, this means whether or not two qubits are linked by circuit wires (through a coupler 
such as a capacitor); in a trapped ion device, this means whether or not laser beams can simulta- 
neously address the two qubits. Connectivity graphs for superconducting device is commonly of 
the 2D mesh/lattice type, as shown in Figure 6.10. In contrast, trapped ion devices typically have 
much dense connectivity graphs, thanks to flexibility in performing two-qubit gates. Figure 6.11 
shows some examples of trapped ion device connectivity graphs. 

Given a quantum program, we can define a program interaction graph as a graph G — 
(V. E) where V is a set of logical qubits present in the computation, and E is a set of two- 
qubit interaction gates contained in the program (e.g., CNOT gates). By analyzing this graph, 
we can perform an optimized mapping, which assigns a physical location for each logical qubit 
q € V. The program interaction graph of the toy circuit can be constructed from enumerating 
all two-qubit gates, as shown in Figure 6.12. 
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Figure 6.10: Different connectivity graph for superconducting devices. Left: 2D square lattice. 
Right: 2D (heavy) hexagonal lattice. The choice of connectivity graphs is typically based on 


hardware constraints such as wiring bandwidth and noises of circuit components. 





Figure 6.11: Different connectivity graph for trapped ion devices. Left: Complete (Clique) con- 
nectivity for small number of ions in one trap. Center: Weakly connected cliques for multiple 
traps. Right: A long chain of ions in one trap in a tape-like architecture. 
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Figure 6.12: The program interaction graph for an example quantum circuit. 


The goal of a mapper can be described as to most efficiently embed a program interaction graph 
in a device connectivity graph. Here efficiency means objectives such as minimizing communica- 
tion and avoiding noisy qubits and links, etc. Once the program interaction graph is embedded, 
each edge (two-qubit interaction) is weighted by the cost of completing such interaction on the 
mapped qubits. 
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Cost of Long-Distance Interactions 

In the context of NISQ machines, long-distance interactions are typically resolved by moving 
or swapping qubits. For simplicity, we call these as “routing qubits.” Some architectures (such 
as trapped ion devices) support shuttling/transporting qubits directly, others (such as supercon- 
ducting devices) bring two qubits together through a chain of swaps. Moving or swapping qubits 
not only costs time but also introduces errors. 

Having the device connectivity graph is very convenient for analyzing these communica- 
tion costs. We can model the overhead of long-distance interactions as weights on the edges of 
the connectivity graph. For example, the weighted distance from qubit q; to qubit q; is used to 
represent the cost of moving from location q; to location g;. Most existing mappers follow a 
two-step approach to find a good qubit mapping: (i) allocate qubits and (ii) route long-distance 
interactions. 

Note that some work presents qubit routing as part of their scheduling strategies. But 
qubit mapping and gate scheduling are so intertwined that we expect them to be optimized 
together. 


6.44 FINDING A GOOD QUBIT MAPPING 


Three common heuristics which we analyze for minimizing communication cost (sometimes re- 
ferred to as network congestion) are: edge distance minimization, edge density uniformity, and 
edge crossing minimization. In particular, the following analyses are performed on the program 
interaction graph mapped to the connectivity graph, denoted as the mapped interaction graph. 


* Edge Distance Minimization The edge distance of the mapping can be defined as the 
Euclidean distance between the physical locations of each endpoint of each edge in the 
mapped interaction graph. Intuitively, in classical systems network latency correlates 
strongly with these distances, because longer edges require longer duration to execute. 
Similar situation applies to NISQ machines which communicate through swap chains. 
Longer-distance swap chains take more steps to complete and are more likely to overlap 
with other chains. It is worth noting that, for FT machines using surface code braiding 
operations [27], there is no direct correspondence between single braiding distance 
and time to complete the braid. However, longer surface code braids are more likely 
to overlap than shorter braids simply because they occupy larger area on the network, 
so minimizing the average braid length may reduce the induced network congestion as 


well [161]. 


* Edge Density Uniformity When two edges in the mapped interaction graph are very 
close to each other, they are more likely to intersect and cause congestion. Ideally, we 
would like to maximize the spacing between the edges and distribute them on the 
network as spread-out and uniformly as possible. 'This edge-edge repulsion heuristic 
therefore aims to maximize the spacing between operations across the machine. 
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* Edge Crossings Minimization We define an edge crossing in a mapping as two pairs 
of endpoints that intersect in their geodesic paths, once their endpoint qubits have 
been mapped. ‘These crossings can indicate network congestion, as the simultaneous 
execution of two crossing operations could attempt to utilize the same resources on the 
network (e.g., swap through the same qubits). While the edge crossing metric is tightly 
correlated with routing congestion, minimizing it has been shown to be NP-hard and 
computationally expensive [241]. An edge crossing in a mapping also does not exactly 
correspond to induced network congestion, as more sophisticated routing algorithms 
can in some instances still perform these braids in parallel [242]. Some algorithms 
exist to produce crossing-free mappings of planar interaction graphs, although these 
typically pay a high area cost to do so [243]. 


With these objectives in mind, we demonstrate three procedures designed to optimize 
mappings, namely the graph partitioning approach, force-directed annealing approach, and 
community clustering approach. 


Recursive Graph Partitioning 

To compare against the local force-directed annealing approach, we also analyzed the perfor- 
mance of a global grid embedding technique based upon graph partitioning (GP) [244—246]. 
In particular, we utilized a recursive bi-sectioning technique that contracts vertices according to 
a heavy edge matching on the interaction graph, and makes a minimum cut on the contracted 
graph. This is followed by an expanding procedure in which the cut is adjusted to account for 
small discrepancies in the original coarsening [247, 248]. Each bisection made in the interaction 
graph is matched by a bisection made on the grid into which logical qubits are being mapped. 
The recursive procedure ultimately assigns nodes to partitions in the grid that correspond to 
partitions in the original interaction graph. 


Force-Directed Annealing 

The force-directed annealing [249—251] procedure consists of iteratively calculating cumula- 
tive forces and moving vertices according to these forces. Vertex-vertex attraction (to reduce 
edge length), edge-edge repulsion (to reduce edge density), and magnetic dipole edge rotation 
(to reduce edge crossing) are used to calculate a set of forces incident upon each vertex of the 
graph [161]. Once this is complete, the annealing procedure begins to move vertices through 
the mapping along a pathway directed by the net force calculation. A cost metric determines 
whether or not to complete a vertex move, as a function of the combination of average edge 
length, average edge spacing, and number of edge crossings. The algorithm iteratively calculates 
and transforms an input mapping according to these force calculations, until convergence in a 
local minimum occurs. 
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Community Clustering 

In an interaction graph, subsets of qubits may interact more closely than others. These groups 
of qubits can be detected by performing community detection analysis on an interaction graph, 
including random walks, edge betweenness, spectral analysis of graph matrices, and others [252- 
257]. By detecting these structures, we can find embeddings that preserve locality for qubits that 
are members of the same community, thereby reducing the average edge distance of the mapping 
and localizing the congestion caused by subsets of the qubits. 

To respect the proximity of the vertices in a detected community, we break up our pro- 
cedure into two parts: first, impose a repulsion force between two communities such that they 
do not intersect and are well separated spatially; and second, if one community has been broken 
up into individual components/clusters, we join the clusters by exerting attracting forces on the 
clusters. In particular, we use the KMeans clustering algorithm [258, 259] to pinpoint the cen- 
troid of each cluster within a community and use them determine the scale of attraction force 
for joining them. 


6.4.2 STRATEGICALLY REUSING QUBITS 


Reclaiming Qubits via Measurement and Reset 

When some qubits are disentangled from the data qubits, we can directly reclaim those qubits 
by performing a measurement and reset. We can save the number of qubits, by moving mea- 
surements to as early as possible in the program, so early that we can reuse the same qubits after 
measurement for other computation. Prior art [260, 261] has extensively studied this problem 
and proposed algorithms for discovering such opportunities. 

This measurement-and-reset (M&R) approach has limitations. First, today’s NISQ hard- 
ware does not yet support fast qubit reset, so reusing qubits after measurement could be costly 
or, in many cases, unfeasible. The state-of-the-art technique for resetting a qubit on a NISQ ar- 
chitecture is by waiting long enough for qubit decoherence to happen naturally, typically on the 
order of milliseconds for superconducting machines [51], significantly longer than the average 
gate time around several nanoseconds. FT architectures have much lower measurement over- 
head (that is roughly the same as that of a single gate operation), and thus are more amenable to 
the M&R approach. Second, qubit rewiring as introduced in [261] works only if measurements 
can be done early in a program, which may be rare in quantum algorithms—measurements 
are absent in many program (such as arithmetic subroutines) or only present somewhere deep 
in the circuit. Unlike the uncomputation approach, M&R does not actively create qubit reuse 
opportunities. 


Reusing Qubits via Borrowing 

Another strategy for reusing qubits involve temporarily borrowing a qubit for computation and 
return the qubit to its original state when completed. This technique is sometimes called the 
“dirty borrowing” of qubits, because the qubits we borrow can be in an arbitrary unknown quan- 
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Figure 6.14: Implementation of A?(X) gate using an ancilla qubit in an arbitrary state. 


tum state; this is to be contrasted with the uncomputation technique we will introduce next, in 
which the qubits to reuse are always clean ancilla (i.e., qubits initialized to a known state such 
as |0)). 

For example, Figures 6.13 and 6.14 demonstrate the implementation of multi-controlled 
NOT gate such as A?(X) via borrowing of clean and dirty ancilla qubits, respectively. 

Dirty borrowing opportunities depends highly on the structures in quantum circuits; the 
reason is two-fold. First, we need to return the borrowed qubits to their original states timely, 
otherwise the original computation has to be stalled. Second, the borrowing computation is 
restricted. For example, one typically cannot perform measurements on the borrowed qubits, due 
to entanglement with other qubits. Because the quantum state of the dirty ancilla is unknown, 
the borrowing circuit typically rely on the difference between |y} and X |W), instead of the state 
|W) itself, to perform computation. This technique has been applied to speed up implementations 
of arithmetic circuits as found in [262, 263]. 


6.4.3 HIGHLIGHT: UNCOMPUTATION 


Reclaiming ancilla qubits that are entangled with data qubits is non-trivial, as measuring and 
resetting them will alter the data qubits’ state. Fortunately, uncomputation, introduced by Ben- 
nett [74], is the process for undoing a computation in order to remove the entanglement re- 
lationship between ancilla qubits and data qubits from previous computations. Figure 6.15 il- 
lustrates this process. In that circuit diagram, the Ur box denotes the circuit that computes a 
classical function f. The garbage produced at the end of Uy is cleaned up by storing the out- 
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Figure 6.15: Ancilla qubit reclamation via uncomputation. Each horizontal line is a qubit. Each 
solid box contains reversible gates. Qubits are highlighted red for the duration of being garbage. 


put elsewhere and then undoing the computation. This “storing” step can be done by applying 
qubit-wise CNOT gates controlling on the output qubits and targeting on the copy qubits. One 
could alternatively accumulate the output values to some result qubits, using operations such as 
in-place addition [212]. The *uncomputation" step is accomplished by applying all gates in Ur 
in reverse order, denoted as U~ ! in the diagram. In the end, all output qubits are safely stored 
and all ancilla qubits are reset to their initial value, typically in the |0) states. 

This uncomputation approach has two potential limitations: first, we need to pay for 
the additional gate cost, and second, it only works if the circuit Uy is a classical reversible 
transformation—i.e., can be implemented with Toffoli gate alone, optionally with NOT gate 
and CNOT gate. This excludes the common quantum gates such as H gate and Phase gate. 
Nonetheless, classical reversible computation implements the arithmetic logic on a quantum 
computer, which plays a large part in many quantum algorithms [11, 13, 214]. 

The optimization of qubit allocation and reclamation in reversible programs dates back 
to work as early as [213, 264], where they propose to reduce qubit cost via fine-grained un- 
computation at the expense of increasing time. Since then, more work [265—268] follows in 
characterizing the complexity of reclamation for programs with complex modular structures. 
Recent work in [201, 212] show that knowing the structure of the operations in Uy can also 
help identify bits that may be eligible for cleanup early. A more recent example [269] improves 
the reclamation strategy for straight-line programs using a SAT-based algorithm. Some of the 
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above work emphasizes on identifying reclamation opportunities in a flat program, whereas our 
focus is on coordinating multiple reclamation points in a larger modular program. 

Most existing qubit reuse algorithms using uncomputation [212, 213, 264] emphasize on 
the asymptotic qubit savings, and commonly make an ideal assumption that machines have all- 
to-all qubit connectivity (i.e., no locality constraint). Since all qubits are considered identical, a 
straightforward way to keep track of qubits is to maintain a global pool, sometimes referred to as 
the ancilla heap. Ancilla qubits are pushed to the heap when they are reclaimed, and popped off 
when they are allocated, for instance in a last-in-first-out (LIFO) manner. In this ideal model, 
we can simply track qubit usage by counting the total number of fresh qubits ever allocated 
during the lifetime of a program. 


Reversible Pebbling Game 

Uncomputation exposes a very interesting tradeoff between space and time costs of a circuit. 
One may worry the substantial gate cost in running computation forward and backward is too 
high to be practical. Fortunately, it is possible to significantly reduce space by clean up garbarge 
qubits by strategically uncomputing parts of the circuit during the course of computation. The 
problem resembles a game, i.e., the reversible pebble game. Suppose we divide a computation into 
m sequential steps of roughly equal size, each of which is represented by a node in a graph. The 
dependency graph thus looks like a length-m directed path. Consider executing step k as placing 
a pebble on the kth node of the graph, and reversing step k as removing a pebble from the node. 
‘The goal of the game is to place a pebble on the mth node, under the rule that we can place or 
remove a pebble on node k if k = 1 or node k — 1 has a pebble. 

We now describe a divide-and-conquer strategy that reaches node m using only log m + 1 
pebbles. Suppose we divide a computation into m — 24 steps of roughly equally size. Then we 
can construct a program call graph that is a balanced binary tree of depth d, where the leaves 
contain the steps in the computation, and all internal nodes have two children function calls. 
We execute the program by traversing the program call graph in depth-first search order. In 
particular, when we enter a node in the tree, we execute the corresponding function forward; 
when we exit a node in the tree, if it is the left child of its parent, we execute the function 
backward. Pictorially, we demonstrate the process for d — 2 in Figure 6.16. 

As a result, we follow a recursive pattern of two steps forward and one step backward as 
we proceed in the computation. That is, for every node fp whose two children are denoted as 


fi and fr: 
Il . 


Assume each step in the computation takes unit time. We found that the total time to 
execute the program without uncomputation is T — 27. In comparison, the total time to ex- 
ecute the program with the divide-and-conquer approach is T’ = 34 ~ T1826) = T1585, [n 
principle, one can further reduce the power to close to 1 by dividing computation into (7 steps 
and constructing a balanced £-ary graph where the first £ — 1 children are uncomputed. 
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Figure 6.16: Divide-and-conquer approach to reversible pebble game. The balanced binary tree 
is traversed in depth-first search order. Function is executed when entering a node in the tree; 


function is uncomputed when exiting a node that is a left child of its parent (denoted by blue 
check marks). 


In a realistic setting, the gate cost and qubit cost for each step can be different. One 
important factor to consider is the locality of qubits in a quantum architecture. À two-qubit 
gate, such as a CNOT gate, takes time proportional to the communication cost between the 
two operand qubits. So, in general, uncomputation needs to be performed frequent enough to 
prevent garbage from accumulating, making available qubits with good locality for reuse. In the 
next section, we demonstrate how to use a heuristic-based uncomputation algorithm to integrate 
qubit allocation with reclamation under realistic architectural constraints. 


Heuristic-Based Noise-Aware Uncomputation 

To more realistically estimate the cost of running algorithms on a machine with qubit local- 
ity constraint, we need to take the communication overhead into account [227]. Suppose we 
are given a program with n functions that potentially have qubit reclamation. Our goal is to 
determine the optimal choices of whether to reclaim at each of those n locations. 

Qubit allocation can benefit from locality awareness. A good algorithm prioritizes qubits 
according to their locations in the machine. At a high level, it chooses qubits from the ancilla 
heap by balancing three main considerations—communication, serialization, and area expan- 
sion. 

When deciding which qubits to allocate and reuse, one approach is to use a heuristic- 
based algorithm that assigns priorities to all qubits. The priorities are weighted not only by 
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the communication overhead of two-qubit interactions but also by their potential impact to 
the parallelism of the program. Reusing qubits adds data dependencies to a program and thus 
serializes computation, but not reusing qubits expands the area of active qubits and thus increases 
the communication overhead between them. 

Uncomputation decisions can be made with a simple cost-benefit analysis: at each poten- 
tial reclamation point, we estimate and compare two quantities: (i) cost of uncomputation and 
reclaiming ancilla qubits; and (ii) cost of no uncomputation and leaving garbage qubits. To do 
so, we need an efficient way to accurately estimate the C; and Co quantities. This is a non-trivial 
task. In particular, the decision of child function affects not only the cost of itself, but also the 
cost of parent function. If a child function decides to uncompute, the additional gate costs need 
to be duplicated should its parent decide to uncompute as well. This is a phenomenon called 
“recomputation.” Thus, we should take the level of the function into account when we make the 
decision. 


65 SUMMARY AND OUTLOOK 


A generic tool flow for quantum programming and compilation includes multiple layers of trans- 
formations and optimizations. A quantum algorithm is implemented in a quantum program 
using a quantum domain specific language (DSL) (see Chapter 5). The program is then trans- 
lated into quantum intermediate representation (QIR), undergoing a series of transformation in 
the compiler, including circuit synthesis and optimization, gate scheduling, and qubit mapping, 
etc. The QIR is then translated into analog pulse sequences for control qubits (see Chapter 7). 
[31, 270] discuss the design of quantum computer architectures in greater details. 

The most widely used approach for combining all transformations in the compiler is 
called the pass-driven approaches, where each transformation/optimization is applied once in 
sequence. There is another approach called instrument-driven (commonly used in LLVM) is 
somewhat strange in that we produce an classical executable program that generates the output 
OIR. For instance, we take a C program with quantum code, turn into an executable classical 
program which has all the classical control such as conditionals and loops and then print out the 
program as it runs. One of the advantages to this approach is to make the analysis more tractable. 
With really large programs and really large machines, the pass-driven approach would occupy 
huge amount of memory. This is more scalable. For NISO machines, scalability is usually not 
our main concern, making it optimal is more important. 


Further Reading 

Here we highlight some other exciting research directions in software support for NISQ com- 
puters. Efficient logic synthesis methods for higher-dimensional quantum systems are not so 
well understood. It remains an open problem to produce optimal circuits for logic in d-level 
systems (with qudits, as opposed to qubits) [226, 271-274]. Even though we have divided the 
discussions on quantum computer systems as different layers. They are intertwined. In fact, 
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cross-layer optimizations are believed to be critical for bridging the gap between algorithms and 
hardware [31, 275]. The growing complexity in compiler transformations also necessitates the 
verification of the compiler. 

Quantum circuits need to be highly optimized so as to fit in the exacting resource con- 
straints. Hence, an active research direction is the logical (instruction-level) optimizations of 
quantum circuits. A wide range of techniques have been developed to optimize various aspects 
of quantum circuits. For example, circuit template optimization recognizes circuit patterns and 
substitutes them with more efficient substitutions. In a nutshell, given a start state and an end 
state, we aim to find the shortest path between those two states. This is to be differentiated with 
optimizations using pulse compilation, where we look for the most efficient physical pulse to 
implement a native instruction as accurately as possible. In this case, at the compiler level, we 
substitute a sequence of instructions that is logically equivalent to the ideal sequence. With that 
being said, circuit optimization and pulse optimization are not without connections. For in- 
stance, sometimes a more accurate pulse can be implemented using one sequence of instructions 
than the other, so knowledge of pulse control can sometimes inform a compiler which compiled 
program is preferable. 
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CHAPTER 7 


Microarchitecture and Pulse 
Compilation 


One cannot really talk about building scalable quantum computer systems without bringing the 
discussion of an architectural support for the increasingly complex control and memory modules 
to the table. As we enter the NISQ era and beyond, we will need to orchestrate the simultaneous 
quantum operations on hundreds or thousands of qubits. What kind of microarchitecture can 
keep up with the speed and bandwidth of quantum information processing? How do we build 
a reliable interface between classical control/feedback signals and quantum data? Can we eff- 
ciently translate and synchronize machine pulses from gate instructions? This may be feasible 
by hand for small-scale devices with 5-10 qubits, but it will soon become intractable without an 
automated, robust control system as the size of the devices scales up. 

In this chapter, we pay particular attention to three aspects of such systems: classical and 
quantum control of qubits, pulse generation and optimization, and calibration and verification. 
“From Gates to Pulses” describes the general flow for constructing pulse sequences. “Quantum 
Controls and Pulse Shaping” illustrates the progress and challenges in managing the classical 
and quantum datapath required for a large number of qubits under tight time, power and band- 
width budgets. Next, “Quantum Optimal Control” shows the general principles in translating 
quantum gates to hardware pulses, and demonstrates two novel techniques, each targeting higher 
pulse quality and faster compilation time. 


7.1. FROM GATES TO PULSES-AN OVERVIEW 


7.1.1 GENERAL PULSE COMPILATION FLOW 


At the lowest level of control hardware, qubits are driven by analog pulses. Recall from Chapter 2 
that, depending on the types of the qubits, these pulses are sent in different forms, e.g., modu- 
lated lasers for trapped ion qubits, and microwave electric signals for superconducting transmon 
qubits. Therefore, quantum compilation must translate from a device-independent high-level 
quantum program down to a sequence of device-dependent control pulses. 

Figure 7.1 shows the general flow for compiling analog pulses. The input to this process 
is a sequence of quantum instructions (produced by logical-level compilations and optimiza- 
tions including scheduling and mapping), and the output is a sequence of analog pulses that 
implements the logical instructions. 
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Figure 7.1: General flow for translating quantum gates to analog pulses. 






























































The simplest approach to perform such gate-to-pulse translation is by using a lookup table 
(LUT). Once a quantum algorithm has been decomposed into a quantum circuit comprising 
single- and two- qubit gates, the compiler simply proceeds by concatenating a sequence of pulses 
corresponding to each gate. In particular, a lookup table maps from each gate in the gate set to 
a sequence of control pulses that executes that gate. 

For example, in a superconducting architecture, the (R,(0), Rz ($), CX} gate set alone is 
sufficient for universality, so in principle the H and SWAP gates could be removed from the 
compilation basis gate set. However, we include the generated pulses. 

The advantage of the LUT approach is its short pulse compilation time, as the lookup 
and concatenation of pulses can be accomplished almost instantaneously. However, it prevents 
the optimization of pulses from happening across the gates, because there might exist a global 
pulse for the entire circuit that is shorter and more accurate than the concatenated one. The 
quality of the concatenated pulse relies heavily on an efficient gate decomposition of the quantum 
algorithm. 

Another generic technique for generating pulses is through pulse shaping tools based on 
quantum optimal control (QOC) theory, which is illustrated in detail in Section 7.3. 

The next section delves into pulse shaping techniques and introduces the design principles 
of quantum controls in greater details. 


7.0 QUANTUM CONTROLS AND PULSE SHAPING 


Controlability of a quantum system is very much a fundamental issue. The aim is to estab- 
lish a framework of strong theoretical understanding and practical methodology for driving a 
physical system to a desired state. This is challenging because quantum systems exhibit unique 
characteristics (such as coherence, superposition, and entanglement). One of the most promi- 
nent distinctions between classical control and quantum control is the difficulty in acquiring 
information about the state of a quantum system without disturbing it, which makes feedback 
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Figure 7.2: Open-loop control (left) vs. closed-loop control for pulse shaping. 
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control non-trivial. In the following, we describe some leading strategies for qubit controls and 
demonstrate the challenges in crossing the classical-quantum boundary. 


7.2.1  OPEN-LOOP VS. CLOSED-LOOP CONTROL 


Quantum control techniques can be generally classified into two categories: (i) open-loop con- 
trol and (ii) closed-loop control. 'The primary distinction between the two is whether there is 
real-time feedback from the quantum systems—in open-loop control, systems outputis not con- 
tinuously monitored, whereas in closed-loop control, the controller calibrates the pulses based 
on continuous feedback. Figure 7.2 pictorially illustrates the distinction. 

Studies in open-loop control generally fall into three categories. First, some work focuses 
on the optimality and reachability of pulses for different quantum systems (i.e., Hamiltonians). 
One widely used approach is to express the controllability criteria in terms of structures in Lie 
groups and Lie algebras [276], or in terms of graph theoretical concepts [277, 278]. Second, 
numerical optimal control theory has shown to produce versatile and realistic pulse sequences. 
It uses numerical methods to search for the best way of achieving given quantum objectives 
in shortest time and most realistic shape. This strategy is sometimes referred to as open loop 
coherent control [279-281]. Third, Lyapunov-based control design [282, 283] is another useful 
approach in open-loop quantum control. In this approach, the control input is determined by 
the state of the systems. In quantum control, however, it is non-trivial to obtain information 
about quantum states without disturbing them. An open-loop control design is used to simulate 
an artificial closed-loop system. Closed-loop control is believed to be more robust and reliable. 
There are in general two different approaches in closed-loop control [284]: (1) learning control 
and (ii) feedback control. In closed-loop learning control, each cycle of the loop is executed on 
a new copy of the system [285]; whereas in a feedback control, information about the states of 
the quantum systems is continuously fed back to the controller through measurement or state 
estimation [286]. 

For more details on quantum control, we refer the interested readers to tutorials [276, 287— 
289]. 
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7.5 QUANTUM OPTIMAL CONTROL 


Given a general setting of quantum systems, GRadient Pulse Engineering (GRAPE) is a strat- 
egy for numerically finding the best control pulses for computation by following a gradient 
descent procedure [290, 291]. This is sometimes viewed as the last step in quantum circuit 
synthesis—the output of GRAPE is the control pulse parameters needed for the underlying 
hardware architecture. 

The basics of GRAPE can be illustrated by examining the Hamiltonian picture of the 
physical systems. Consider a quantum system with intrinsic Hamiltonian Ho and a set of ex- 
ternal controls, in the form of time-dependent Hamiltonian operators, {H;,H2,..., Hm}. The 
overall system Hamiltonian can be written as 


H(t) — 9L + yu. 
i=1 
where H,(t) is typically a product of time-dependent dimensionless amplitude and a time- 
independent control operator. That is, 


Hi(t) = ui (H;i). 


The objective of the optimal control theory is to find the control fields 
{uy (t),u2(t),...,Um(t)}, such that the H(t) approximates the target unitary U to high 
precision. 

What is the implemented unitary as a result of the Hamiltonian evolution? Solving for the 
solution to a continuous time-dependent equation is challenging, so we discretize the time inter- 
val into sufficiently small time steps ôt. This is called the piecewise-constant approximation. Sup- 
pose we evolve the quantum systems, from f, to the jth timestep, t = fo + jdt. For timestep j, 
the set of constant control fields is denoted by {u1,;,W2,;,.-.,Um,;}. Then the time-independent 
Hamiltonian for timestep j is 


m 
Hj = Ho + 2 iign 
i=1 
Therefore, the unitary operation accomplished at timestep j is 
U; = e` ôt 
= ; 
Evolving from fo to t; by the piecewise-constant approximation can be written as 
U = U;U;-1 eet U5U,. 


GRAPE performs gradient descent search over the space of possible control field param- 
eters that approximate the targeted unitary matrix up to a specified fidelity. In general, if the 


7.3. QUANTUM OPTIMAL CONTROL 131 


input state to the computation is known, GRAPE can optimize for a control pulse that works 
for that particular input state. In other words, we can find an approximation U # U, but 


| Wout) = Ü |Win) x U |Win) : 


Besides the final fidelity, the set of control field parameters must satisfy some constraints, 
such that the resulting control pulses are physically realizable. Following the analysis in [292], 
we name a few such constraints. 


* The amplitude of each control field [u;| parameter is to be minimized. 


* Each control parameter needs to form a smooth pulse over time; 5 ;; j |j; — 435,444] 
is to be minimized. 


* Evolution time, i.e., pulse time, is to be minimized. Because qubits have short lifetimes 
due to quantum decoherence. 


Minimizing the cost function is the key in quantum optimal control. To be able to apply 
GRAPE, the cost functions need to be differentiable. 'The gradients are computed analytically 
and backpropogated with automatic differentiation. GRAPE has been used to mitigate other 
sources of error such as gate errors, State Preparation and Measurement (SPAM) errors, and 
qubit crosstalk, as demonstrated in past work [293-295]. 

It remains open problem to speedup the GRAPE algorithm, as state-of-the-art tech- 
niques scale up poorly. While GRAPE on one- or two-qubit unitaries might find the optimal 
pulse sequence quickly, it would take an unreasonable amount of time for circuits with five or 
more qubits. In the following, we will see a systems approach to speedup the pulse compilation 
process for variational algorithms. 


7.3.1 HIGHLIGHT: COMPILATION FOR VARIATIONAL ALGORITHMS 


The conventional approach for compiling large programs is to synthesize a quantum program 
using a small set of primitive gates, apply quantum optimal control theory to each primitive 
gates, and string together all pulses to accomplish the computation. While this approach results 
in a constant time compilation (as we can build a lookup table for the pulses of the primitive 
gates and reuse them for every circuit), it does not produce the optimal pulse sequence (e.g., 
shortest pulse length). On the other spectrum, while full quantum optimal control generates 
the fastest possible pulse sequence for a target circuit, its compilation latency is large due to time 
spent on GRAPE. 

As such, partial compilation has been proposed [159], which compiles only parts of a quan- 
tum circuit using GRAPE while leaving the rest in a lookup table, balancing optimality in pulse 
length and pulse compilation time. 

This approach is particularly useful for variational quantum algorithms. Here the quantum 
circuit are parametric as seen in Chapter 3, meaning that there are some parts of the circuit that 
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are fixed, and there are other parts that change according to a set of parameters. In a variational 
algorithm, every iteration of quantum circuits differs only slightly. Hence, we can use GRAPE 
to recompile only the changing parts, yielding lower cost than recompiling the entire circuit. 


7.4 SUMMARY AND OUTLOOK 


Further Reading 

Quantum control theory has already attained significant success since the beginning of quantum 
computation research [284, 296, 297]. But scalability (in compute time and memory) of the pulse 
optimizer is still an issue. More intelligent methods must be developed for high-dimensional 
space optimizations, as gradient techniques generally do not work well in higher-dimensional 
spaces. Developing methods for modifying pulses remains an exciting open area. This is a well- 
motivated problem because it is potentially beneficial to leverage calibrated pulses for different 
machines, to use one pulse as a guide for optimization on another, or to combine pulses for sub- 
systems into one composite pulse [298]. Techniques such as dynamical decoupling, are extremely 
effective in error mitigation by interrupting quantum systems by z-pulses [299-302]. Recent 
efforts have used quantum optimal control theory to speed up pulses for a range of quantum 
algorithms [159, 292, 303-305]. 

Controlling quantum computation is difficult, in part because of the boundaries between 
the classical and quantum world. One such example is the scale boundary—there is a discrep- 
ancy in scale between the qubit object and the classical control mechanism, e.g., [131, 306] 
raise the issue of pitch-matching. Another example is the environment boundary—in order for 
qubits to behave quantum mechanically, they may need to be put in some critical environment, 
e.g., vacuum, and low temperature. As a result, the control mechanism will have to cross those 
boundaries so as to send the signals to the right qubits with high precision. Some have tried 
integrating classical control circuits into the cryogenic systems of their quantum device [307- 
309] to reduce the bandwidth of the classical control signal needed for keeping up with the 
computation. 
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CHAPTER 8 


Noise Mitigation and Error 
Correction 


For a quantum computer to work, it takes extreme precision to isolate and control qubits. In 
particular, qubits are very short-lived due to the interactions with the environment. Quantum 
logic gates can have small drifts when pulses are out-of-focus or out-of-tune. Classical controls 
and calibrations may have a hard time keeping up in scale, speed, and power. If there were 
no strategies to overcome the aforementioned examples of noises, they would accumulate and 
eventually lead to critical failure in computation. Physical noises in current devices put stringent 
limitations on their computing capabilities. Indeed, this chapter aims to address the central 
issue: how do we protect information from the adverse impacts of noise? We highlight two 
classes of strategies that have been developed for information protection and error reduction in 
quantum systems, namely noise mitigation and error correction. In both cases, we demonstrate 
a few promising examples that effectively lower the error rates, and then stress the drawbacks 
and challenges that still remain. 


8.1 CHARACTERIZING REALISTIC NOISES 


From an engineer’s perspective, the making of a practical quantum device is a process where we 
iteratively improve our capability of estimating and fixing sources of errors. The typical engi- 
neering cycle is as follows: as we understand the sources better, we change the focus to the next 
dominant errors and try to measure and fix them again, until we reach some point that many 
hope to achieve, namely a fault-tolerant quantum computer. As of now, in the NISQ era, our 
experience with quantum computing devices is still nascent. 

The state-of-the-art tools we have at hand for measuring sources of errors include (i) state 
and process tomography and (ii) randomized benchmarking. The goal of these tools are to effi- 
ciently characterize noise sources. 

First we need to understand how to quantitatively study noise. The effects of noise on 
a quantum system are typically probabilistic. Starting with a (pure) quantum state, a quantum 
system undergoes a spontaneous interaction with the environment, which results in a probability 
distribution of quantum states. As such, we revisit our definition of mixed quantum state in 


Chapter 2. 
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Definition 8.1 A mixed quantum state can be written as a probability distribution over some 
pure quantum states: 


p= pili) (i|. 


p is called the density matrix of the quantum state. Naturally, we care about the “distance” 
between results obtained in two scenarios: ideal and noisy. Two common measures of distance 
between two states are fidelity and trace distance. 


Definition 8.2 The fidelity between two states is defined as 


F(p,0) = || VP volli. 


Definition 8.3 The trace distance between two states is defined as 


1 
D«(p,0) = glle— ell. 


Notice that both measures are bounded between 0 and 1, and are symmetric. In fact, fidelity is 
related to the trace distance as follows: 


qp y F(p,0) < D«(p.0) < V 1 — Fine). 


Suppose we want to evolve a quantum system with some process Lf. However, in reality, we 
end up applying a noisy version of the process U. We introduce two common ways of quantifying 
the noise ofa process, namely average error rate and diamond distance. We write the effect of noise 
on a quantum state as a channel: p — E(p). 


Definition 8.4 The average error rate of a process U under a noise channel € can be written 
as: 


ru. £) = 1- fav (rU EQUI). 


The underlying physical interpretation of this measure is that we send a pure quantum state 
|y) through a noisy evolution and back, compute what is the probability of getting the initial 
state back, and then average over all possible pure states uniformly and randomly. ‘The state is 
bounded between 0 and d/(d + 1), where d is the dimension of the Hilbert space. 

Another measure of noise is the diamond distance between two processes. 
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Definition 8.5 The Diamond distance (completely bounded norm) of two processes P and Q 
is defined as: 


1 
D(P, Q) = Sup 3llP 8 Z- Q 8 ZG). 


The supremum is taken over all possible mixed quantum states. The interpretation of di- 
amond distance is that it measures the worst case difference between two channels based on 
single-shot measurements. 

‘The strategies for characterizing noise are quantum state/process tomography and ran- 
domized benchmarking. The former reveals full information about a quantum state or process 
via complex measurements, while the latter reveals partial information through efficient proce- 
dures. 


8.1.1 MEASUREMENTS OF DECOHERENCE 


As shown in Section 2.3.3, two canonical measures of the robustness of a qubit are the 7; relax- 
ation time and the T) relaxation time. In this section, we briefly discuss how to experimentally 
determine the Tı and T5 decoherence rates. 


Measuring T| 
To characterize T; relaxation, the standard approach is to quantify the rate of exponential decay 
using simple steps as follows. 


1. Initialize qubit to ground state p = |0) (OJ. 


2. Apply X gate, i.e., ( i) to the qubit, so p' = XpX. 


3. Wait for time t arriving at p". 
4. Measure the probability of p" being in |1) (1]. 
Measuring T> 
The standard approach to quantify the rate of exponential T> decay is as follows. 


1. Initialize qubit to ground state p = |0) (0]. 
zo a Ki d B 
2. Apply H gate, i.e., RT (i E to the qubit, so p’ = HpH. 


3. Wait for time t arriving at p". 
4. Apply H gate again, so p" — Hp"H. 
5. Measure the probability of p" being in |0} (0]. 
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8.1.2 QUANTUM-STATE TOMOGRAPHY 


The goal of quantum state tomography [310, 311] is to reconstruct an unknown quantum state 
p based on the outcomes from a series of measurements. Suppose we define E; as the projector 
for a particular measurement outcome ej, then the probability of obtaining this outcome when 
measuring p can be written as Pr[e;|o] = tr(E;p) by Born’s rule. For all possible measurement 
outcomes, we can construct a histogram of observations for each measurement. It turns out 
this sampling process is linear, that is Ap — p, where p is the probabilities of measurement 
outcomes. Then we can use /inear inversion to reconstruct p from p, i.e., p = (AT A)-1 AT p. 
Experimentalists commonly use the maximum likelihood estimator (instead of linear inversion) 
to reconstruct p = arg max; Pr[y|p, A], where y is the empirical result of the distribution p. A 
general prescription for tomography is defined as follows. 


1. Prepare target state p. 

2. Measure the state p with different projectors. 

3. Obtain a probability distribution of measurement outcomes p, by sampling repeatedly. 
4. Use estimator p to reconstruct p. 


The apparatus for learning p is extremely simple: 


A 





p 














The computational complexity and sampling complexity of quantum state tomography 
are high. Algorithms exist that optimally choose the set of measurement operators, so that the 
number of samples needed is minimal. Full quantum tomography typically requires resources 
that scale exponentially with the number of qubits, in spite of improvements from techniques 
such as compressed sensing and direct fidelity estimation [312-314]. 

Tomography, although recovering full state information, is limited in estimating noise of 
a process. This is because the state preparation and measurement process can be erroneous as 
well. This is commonly referred to as the SPAM (State Preparation And Measurement) error. 
With current technologies, SPAM errors are dominant sources of noise. 


8.1.3 RANDOMIZED BENCHMARKING 


In contrast, randomized benchmarking (RB) [315—319] emphasizes on gate errors more than 
SPAM errors by applying a long sequence of gates and postponing measurements to the very 
end. It can be summarized in the following steps. 


1. Prepare target state p in computational basis. 


2. Select randomly m Clifford gates: C. 
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3. Apply gates in C in sequence, and then add the inverse gate at the end. 


4. Measure in computational basis E. 


5. Obtain estimate p = Pr[E|C, p]. 
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Note that we can always find a gate Cim that is the inverse of the product of all Clifford 
gates being applied. The interpretation of randomized benchmarking is that it estimates the 
average error rate of average Clifford gates. Notice that RB focuses on gate-independent noise 
caused not by the Clifford gates we apply, but rather other noise that happens during the process. 
As we increase the length m, the system will drift farther and farther. This phenomenon is 
commonly referred to as the "RB decay." 

There has been no standard criteria on choosing the circuit length m, the number of ran- 
dom Clifford gate sets, or the number of measurement shots per circuit. Some tools [320] have 
automated the process of choosing these parameters to ensure getting the most information out 
of a small number of experiments. 


Interleaved Randomized Benchmarking 
In order to characterize the error rates of a particular gate G, we can choose to interleave G 
throughout the standard randomized benchmarking sequence [321]. Now the circuit becomes: 
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Note that we still need to inverse the Clifford sequence. When G is a Clifford gate, we can 
always find such inverse gate. 

Now if we compare the RB decay of this interleaved RB sequence with that of the original 
standard RB sequence, we expect to obtain the effect of noise from the extra application of G, 
and thus bound the fidelity of G. 

A number of variants of randomized benchmarking have been proposed [322-326]. 
Each trades, to different degrees, complexity for more comprehensive noise characterizations. 
Some recent examples include extended randomized benchmarking (XRB) [323, 327] and cycle 
benchmarking (CRB) [328], which capture the behavior of a quantum system more realistically 
than the traditional RB does. 


8.2 NOISE MITIGATION STRATEGIES 


Noise mitigation is one of the biggest challenges facing the QC community. Without strate- 
gies to reduce or get around the physical noises, any execution of a quantum program is almost 
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always doomed to fail under such stringent conditions. Noise mitigation techniques work by 
strategically designing more robust qubits with an ensemble of elements to prolong their life- 
times or performing more accurate gate operations with a composition of pulses to improve their 
fidelities, etc. 

Current NISQ computers [14, 51, 56] lack the ability to isolate and control a sufficiently 
large number of quantum bits (qubits) with high precision. Scaling up a quantum computer 
means improvements in both the quantity and the quality of the qubits. On one hand, we want 
to equip a quantum computer with as many qubits as possible to accommodate large quantum 
applications. On the other hand, we want each qubit to be as long-lived and controllable as 
possible to run quantum programs fast and reliably. 


8.2.1 RANDOMIZED COMPILING 


Quantum gates usually have the distinction between “easy” and “hard” for a given architecture— 
we define the gates that can be implemented with relatively high precision or low resource cost as 
easy gates, and the others as hard gates. A typically division for a NISQ computer architecture is 
the single-qubit gates vs. multi-qubit gates. Single-qubit gates are considered easy because they 
are usually associated with smaller error rates in a NISQ architecture. 

Randomized compiling is a strategy where one inserts random gates into a quantum cir- 
cuit, and averages over many of those independently sampled random circuits. Remarkably, all 
coherent errors and non-Markovian noises can thus be converted into stochastic Pauli errors 
which are arguably easier to detect and correct, while preserving the logical operations of a 
quantum circuit. While the effect of noise on the individual random circuit may be different, 
the expected noise on multiple random circuits is scrambled and tailored into a simple stochastic 
form. The proof of the tailored noise is beyond the scope of the book, but we shall see in this 
section, how a quantum circuit is compiled into a random one that is more robust to noise. 

Specifically, the algorithm goes as follows. 


1. Arrange a quantum circuit U into alternating cycles of easy and hard gates: U = 
n Ux Gx, where d is the number of cycles, Gg (or Ux) is a cycle of easy (or hard) gates 
on n qubits. 


2. Insert a layer of randomly selected Pauli gates on n qubits before the hard gates in each 
cycle, denoted as Py = G9". , po. where each p? is sampled from the Pauli group P € 
{I, X, Y, Z}. In the following, we will omit the implicit superscript and denote the random 
Pauli gates for the kth cycle as Px. 


3. Insert a layer of correction gates after the hard gates in each cycle, denoted as p , such 
that logical equivalence is preserved for the kth cycle: Up = Pf Ux Py. 


4. Absorb the inserted gates around the easy gates in each cycle into the dressed form G; = 
PG PE. 
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Figure 8.1: An example quantum circuit sliced into interleaved layers of single-qubit gates and 
two-qubit gates. Shown here the kth cycle and the (k 4- 1)th cycle. 























































































































do =- Px Pr Pga Pit 
qo Px PE Piri Pii 
Gk U; Gk+1 Ui : 
z l 
Pk Pk Pi Pki 
dn ` Px Fy Pitt Prat 





















































Figure 8.2: Randomly sample Pauli gates on n qubits are inserted before the hard gates, and 
their corresponding correction gates inserted after the hard gates. 
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Figure 8.3: The inserted gates are absorbed into the easy gates. As a result, the final random 
circuit has the same depth as that of the original circuit. 


More concretely, suppose the kth cycle consists of a layer of single-qubit gates (as easy 
gates) denoted as Gx and a layer of two-qubit gates (as hard gates) denoted as Ug, as shown 
in Figure 8.1. After inserting the random twirling gates and their corrections, we arrive at the 
circuit in Figure 8.2. 'Then the random gates are compiled with the easy gates together, as shown 
in Figure 8.3. 

[329] demonstrated that the randomized transformation can be viewed as a scrambling 
of noise on the quantum circuit. Thus, if we average over multiple shots of randomly sampled 
circuits, the randomization tailors the noise into stochastic Pauli errors. Consequently, we have 
obtained a logically equivalent circuit, whose circuit depth is unchanged asymptotically, but 
is more resilient to external noises. For detailed analysis, we refer the interested reader to the 
original work in [329]. 
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8.2.2  NOISE-AWARE MAPPING 


As much as we want the qubits to be reliable and long-lived, there is still noises present in the 
qubits themselves and the gate operations we apply on them. With current superconducting 
technology, it is hard to make identical transmon qubits in a device. Gate errors for different 
qubits can vary by the hour. 

Several recent quantum circuit mapping and scheduling techniques use heuristics to opti- 
mize for specific program input, physical machine size and physical topology. Two studies from 
Princeton [228] and GATech [229] go even further, observing from IBM daily calibration data 
that qubits and links between qubits vary substantially in their error rate. These compilers take 
these daily variations into account and optimize to increase the probability of correct program 
output. A follow-on study [330] demonstrates that this approach substantially reduces error rate 
as compared to native compilers for the IBM, Rigetti, and UMD quantum machines. 


82.3 CROSSTALK-AWARE SCHEDULING 


In this section, we highlight a number of strategies for reducing crosstalk error in superconduct- 
ing architectures. A number of hardware features have been proposed to help mitigate crosstalk: 
(i) connectivity reduction, (ii) qubit frequency tuning, and (iii) coupler tuning. In addition to 
these hardware features, some software constraints are usually imposed to effectively reduce 
crosstalk; for example, certain operations may be prohibited to occur simultaneously. 

Connectivity reduction works by building devices with sparse connections between qubits, 
hence reducing the number of possible crosstalk channels. This greatly increases the circuit 
mapping and re-mapping overhead for executing a logical circuit, since many SWAP gates are 
needed. Moreover, this model necessitates an intelligent scheduler to serialize operations to 
avoid crosstalk [232]. This strategy is commonly deployed for fixed-frequency transmon archi- 
tectures, e.g., from IBM [51]. Because of their non-tunable nature, these architectures have 
stringent constraints on the initial qubit frequency; a number of optimizers are proposed for 
this issue [331, 332]. 

A second class of techniques rely on actively tuning qubit frequencies to avoid crosstalk, 
featured in some prototypes [333] and by Google [334]. Software can decide when to schedule 
an instruction and which frequency to operate the instruction at. In this class, [335] found 
a frequency assignment for the surface code circuit; [336] suggests a sudoku-style pattern of 
frequency assignment for cavity grid. 

A third class builds not only frequency-tunable qubits but also unable couplers between 
qubits, termed “gmon” architectures [337]. Without resorting to permanently reducing device 
connectivity in hardware, a different subset of connections are activated (via flux drives to the 
couplers) at different time steps. As such, a schedule for when to activate couplers is needed. For 
instance, Google proposes a tiling pattern in [66]. 

Most previous studies on quantum program compilation [159, 305] have largely targeted 
short program execution time (i.e., low circuit depth), and neglected the impact of gate errors 
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such as crosstalk. Optimizations are performed at the gate level, typically involving strategic 
qubit mapping and instruction scheduling. Recent efforts [232, 332] are among the first to 
explore the compiler’s role, such as designing intelligent scheduler, to avoid crosstalk. 


8.3 QUANTUM ERROR CORRECTION 


Quantum error correction (QEC), first developed in [338, 339], achieves fault tolerance by re- 
peatedly discretizing continuous errors into digital errors and using many redundant qubits to 
flag any errors that have occurred to the quantum state, an idea that is not so distant from mem- 
ory refresh and classical coding theory. It allows us to track and correct errors in real-time while 
executing a quantum program. QEC is an extraordinary discovery—it not only explains why 
we can detect and correct a quantum error, but also provides a recipe for doing so systemati- 
cally. It is a blueprint for how to build a future large-scale quantum computer fault-tolerantly. 
Since the focus of the book is on near-term NISQ research, we have been postponing discus- 
sions on QEC until now. A single section in the book does not justify how remarkable QEC 
is; nonetheless, we will demonstrate a selection of fundamental concepts in QEC, first via an 
example then via a generalized principle. For details about quantum error correction, we refer 


the readers to [22, 25, 86, 157, 340, 341]. 


8.3.1 BASIC PRINCIPLES OF QEC 


As previously discussed, quantum systems are not ideal. There are many variables that have 
an impact on the outcome of a computation. The fidelity rates and coherence times are some 
factors posing challenges. Quantum gate operations and control signals are not perfect. And 
all of these errors build up to non-negligable amounts. That is why we need a way to correct 
accumulating errors. This is the motivation behind quantum error correction. Simply put, the 
purpose of quantum error correction can be summarized as protecting quantum circuits from 
noise. 


Quantum Error vs. Classical Error 

Classically, we are using bits, so the information is stored in 0’s and 1’s. Whenever there is an 
error, the bit is the opposite of what it is supposed to be (i.e., a bit flip). Because classical errors 
are just accidental bit flips, they are digitized. However, quantum errors are continuous. This 
continuous error can be mathematically modeled as follows: 


X gate 
0) ——> Ve |0) + V1—e|1). (8.1) 
Even though physicists do their best to reduce this effect, it sometimes is not enough. 
There are a lot of questions that rise from this situation. Are we able to detect and measure how 
big/small this error e is, or even if we can, is it better to correct it right now or later? One of the 
hard to questions to answer is at what point do we decide to attempt to correct €? 
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Key Ingredients in Quantum Error Correction 

There are two main ideas that make quantum error correction possible. One idea is to use redun- 
dant encoding of information, just like in OR codes. This way, effects of noise in certain parts 
of the system can be tolerated and will not end up corrupting the state of the system. Another 
main idea is to digitize quantum errors, since we know how to deal with digitized errors, as they 
resemble the classical case: 


* redundancy to encode information and 


* digitizing quantum error. 


Quantum Error Correction Code (QECC) 

Quantum error correction code is a mapping from k logical qubits to n physical qubits. Here, we 
must emphasize that n is strictly greater than k, as it takes many physical qubits to realize one 
logical qubit. 'The idea is to use n physical qubits to encode (protect) k qubits of information. 
Exactly n — k qubits are used for redundancy. This mapping can be shown as follows: 


(07) = |000) (8.2) 
lz) = |111). (8.3) 


In the above example, |07.) stands for the “logical” |0} state of the qubit, and it is realized 
by three physical qubits. The string 000 and 111 are called the logical codewords of the code. 
Now suppose that with some small probability p, one of the physical qubits flipped, and we got 
[001). The original “logical” qubit can still be recovered, for example through a majority vote of 
qubits. We would conclude that the third qubit flipped, and the actual qubit was |0z). 


How to Locate a Bit Flip? 

Continuing the above example and representation, locating bit flips can be accomplished by 
looking at output sequences of a two-qubit operator. These operators are ZZI and IZZ, and each 
of the operators act on only one qubit in order. For example, ZZI means a Z gate is applied to 
both the first and the second qubit and the third qubit is left untouched. Recall that 


Z |0) = |0) (8.4) 
Z|) 2-1). (8.5) 


Now, for a state |y} we can look at what the eigenvalues of these two-qubit operators are. And if 
we apply both of these two-qubit gates consecutively, we can determine which bit flipped. Now 
suppose that |y} = |100). This means, 


ZZI |100) — — |100) (8.6) 
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IZZ |100} = |100) . (8.7) 


The eigenvalues observed (in order) are (—1, +1). This sequence tells us that it's the first 
qubit that is flipped. For instance, if the second qubit was flipped, we would instead observe a 
sequence that is (—1, —1). Similarly, we would see (4-1, —1) if the third qubit was flipped. 

If one wishes to compute the phase flip of a qubit, then all Z gates should be replaced 
by X gates, and all |0) and |1) should be replaced by |+) and |—). This preserves the stabilizer 
formalism, as the X gate gives (+1, —1) as eigenvalues when it acts on (|+), |—)). Everything 
else, just remains the same. 


Check Matrix Formalism 

In the literature, the extension of how to locate bit flips to a more generalized case comes through 
the check matrix formalism. The idea ofa check matrix is to create a set of qubit operations using 
the stabilizer formalism, with enough permutations sequences of eigenvalues to determine which 
qubit is flipped. Each row in the check matrix is a gate operation that needs to be applied to 
the system, and each column is representative of physical qubits. For example, the check matrix 
formalism for the above example would contain two rows, one for IZZ, and one for ZZI. It would 
also contain three columns, as there are three physical qubits in that system. The check matrix 


de s» 


where 1’s stand for Z (for bit flip) or X (for phase flip) gates, and 0’s for the identity matrix. This 
matrix shows that first, ZZI must applied, followed by IZZ. A more complicated example where 
eight physical qubits are used would be 

0 

o) ; (8.9) 

0 


1 1 1 100 
1 100 1 1 
101010 


where we see that a series of three gate operations is necessary to encode enough sequences so 
that one can distinguish which qubit is flipped. 


would be: 


= O © 


Nine-qubit Shor Code 

Of course, using physical qubits to protect against bit flips would be no use, if one doesn’t also 
protect against phase flips, and vice versa. Unfortunately, each of the physical qubits individually 
need to also be protected by a second layer of concatenated physical qubits in this case. This gives 
rise to what is called the nine-qubit Shor Code, a two-layer, 3x3 physical qubit set that protects 
against both phase and bit flips and encodes one logical qubit. This way, one layer protects against 
phase flips and the other against bit flips. The logical qubit encoded this way gives us: 
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Figure 8.4: The circuit describing a projective measurement for operation A on state |v). Here, 
A can mean any stabilizer n-qubit gate. For example, it can mean IZZ described above. There 
needs to be an additional ancilla qubit for this process. 
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In this case, operators to check whether phase flips or bit flips occurred changes. We need three 
sets of bit flip checks, and two sets of phase flip checks. These gates are given below: 


ZıZ2, Z2Z3 (8.12) 
Z4Z5, EE (8.13) 
ZiZa; Za Zo (8.14) 
X X3X3, XIX- Xe (8.15) 
Xa Xs Xe X2 Xg Xo. (8.16) 


Projective Measurements 

We now describe how to implement the stabilizer operators in a quantum circuit. The stabilizer 
operators (including the example of ZZI in the previous section) are implemented as projective 
measurements. To see why, we consider the circuit below by calculating the quantum state at 
each time step in the circuit in Figure 8.4. 


1. |0) |y) 
2. |+) |v) 
3. (l0) 1v) + 10 A I) 


4. $[(0) + 11) lv) + (10) — 2) A [V)] = 10) 424 Iw) + 11) G4 |y) 
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Figure 8.5: Syndrome measurement for the stabilizer operator X1 X2 X3. 


where operators +4 and L4 are called “projectors.” It can be shown that any arbitrary state 


|y) can be decomposed into orthogonal states as follows: 


|y) =a|w+) + Bly). (8.17) 


where |) and |y_) are the eigenstates of A with eigenvalues (+1, —1), respectively. In other 
words, 


A|W+) = v). AIv-) = -Iv-). (8.18) 


One can think of these states as “no error” and “error” states as well. Since when we have no 
error, stabilizer operators give us an eigenvalue of +1, and when we have error, it's —1. Therefore, 
we can see further that 





I+A ]1— 
2 2 


which shows us that we recover the original “no error” state |YW+}) with probability o and the 
“error” state |W_) with probability B. So if a /1—e and B./e where € < 1 (i.e., the error is 
small), we recover the “no error” state with high probability. This procedure shows that “projec- 
tors” actually transform the arbitrary state |y) into one of the two states, the “no error” and the 
"error" states. 

For instance, measuring the stabilizer operator X, X? X4 means that we perform a projec- 
tive measurement using an ancilla qubit, as shown in Figure 8.5. 


A WwJ=alve), 19) 








(Ics) + BV) = e Ie) +B 


8.3.2 STABILIZER CODES 


Using the stabilizer formalism defined earlier, we can construct a family of quantum error cor- 
rection codes, defined by. Thanks to the simplicity in the formalism, we are able to borrow the 
concepts from linear codes in classical coding theory. 


Definition 8.6 — An [n,k] stabilizer code C(S) is defined as the vector space stabilized by the 
operators from the abelian subgroup S = (g1.g2..... n-k), where g; € P, \ —I isa stabilizer 
from the n-qubit Pauli group, represented as a length-n Pauli string. k is the number of logical 
qubits that C(S) encodes. 


C(S) = lv) € H,s.t. g |Y) = |W) Vg e S}. 
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The nine-qubit Shor code can therefore be written as a [9, 1] stabilizer code, which uses 9 
physical qubits to encode one logical qubit where the stabilizers are Equations (8.12)-(8.16). 

With this definition of a quantum error correction code, we have the following theorem 
showing the set of errors that can be corrected by the stabilizer code. 


Theorem 8.7 Given a set of Pauli errors €, if for all Ej, E; € €, Ag € S, s.t. E Ejg = —gE] Ej, 
then the set of errors E is correctable by the stabilizer code C(S). 


‘The proof of this theorem can be found in [86]. Effectively, if a Pauli error anti-commute 
with a stabilizer, then the stabilizer can detect and correct an occurrence of the error. This is 
because, upon projective measurement of g, we can observe that E; |Y} is projected to the —1 
eigenstate of g, indicating the occurrence of error. The series of projective measurement out- 
comes are called the syndrome. As such, each error will leave a signature in the syndrome. To tell 
two errors apart, we need their syndromes to be distinct. This process is called decoding of the 
syndromes. We can thus correct the errors appropriately. 


8.3.3 TRANSVERSALITY AND EASTIN-KNILL THEOREM 


Once the error correction code is defined, the next step is to define how to implement logical op- 
erations on the codewords of the code fault-tolerantly. After all, we need to prevent errors from 
propagating through computation. For each error correction code, there is a class of gates whose 
logical gate operations (i.e., encoded gates) are easy to implement fault-tolerantly, namely the 
transversal quantum gates. To prevent the propagation of error during a logical operation, we can 
impose the requirement that each physical gate for the logical gate acts on at most one physical 
qubit in each of the n-qubit code block using the [n, k] code. For example, if logical Hadamard 
gate consists of physical Hadamard gate on each ofthe n physical qubits: H = @j_, H, it would 
be considered as a transversal Hadamard. In the case of stabilizer codes, for example, the logical 
X (i.e., |0) ;, — |1);) and logical Z (i.e., |1); — — |1) 7) operations can be derived by finding the 
operator h € P, V S but commutes with all g € S. Other logical operations are potentially more 
difficult to implement. 

Transversal gates are preferable because the noisy, physical gates are localized in each 
code block, preventing errors from spreading uncontrollably through computation. However, 
the Eastin—Knill theorem states that no quantum error correction code can transversally imple- 
ment a universal gate set. So we have to circumvent the theorem using other techniques to 
implement fault-tolerant quantum gates. We motivate a class of such techniques by the Knill’s 
error correction picture using gate teleportation. 
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Figure 8.6: An encoded teleportation circuit. The input to the teleportation circuits is the en- 
coded qubits and the encoded EPR pair; the circuit consists of the encoded Bell-basis measure- 
ment and encoded recovery Pauli operators. 
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Figure 8.7: Circuit diagram for teleporting a projective measurement TI(S). The output of the 
circuit is RII(S) |y) = II(S)R' |y»). 


83.4  KNILLS ERROR CORRECTION PICTURE 


The Knill's error correction picture differs from conventional error correction in many ways; 
we highlight one of the differences, namely the concept of error correcting teleportation [342], 
which generalizes from gate teleportation [234]. In this picture, error correction is combined 
with logical gate into one step, instead of the conventional syndrome-based scheme discussed 
earlier. The error correcting teleportation circuit, uses a generalization of the teleportation circuit 
to use encoded states and encoded gates, as shown in Figure 8.6. 

The key observation in Knill's error correction picture is that a stabilizer projection on the 
encoded qubits before teleportation is equivalent to a stabilizer projection after the teleportation 
(up to Pauli modification due to the recovery operator at the end of the teleportation circuit), as 
shown in Figure 8.7. 

The argument follows similarly from that of the gate teleportation technique for unitary 
gates, but generalized to projective measurements. Consequently, the syndrome of the input 
state can be deducted from the teleportation Bell measurement [342]. 
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Figure 8.8: The role of magic states in the Knill error correction picture. The magic state |A) 
is teleported to implement a T gate fault-tolerantly. Here, P’ = TXT! = XS". All gates (i.e., 
CNOT gate, P' gate, and measurement) are implemented fault-tolerantly. 


Magic State Distillation 

We can use this picture of error correction to motivate a technique that aims to remedy the 
Eastin-Knill theorem, namely magic state distillation. One of the advantages of the error cor- 
recting teleportation picture [233, 342] is that the difficulty in performing a logical gate U 
fault-tolerantly on the encoded |) is shifted to the difficulty in preparing an encoded resource 
state (also known as magic states) fault-tolerantly. The latter is in general easier, because the 
preparation of specific magic states is generally easier than performing an operation on an un- 
known state: the magic state can be prepared offline (i.e., prior to the computation), and we can 
discard a resource state and start over in case that the preparation circuit fails. 

Magic state distillation, first proposed by Bravyi and Kitaev [28], is precisely the process 
for preparing a resource state fault-tolerantly that corresponds to a non-transversal gate for the 
error correction code. 

A widely studied magic state is the resource state for T gate (i.e., 7/8 gate). For example, 
we can input the following resource state: 


1 —in in 
| = TH) = ae ("19 + e^ 1) 


to the (single-qubit version) teleportation circuit, as shown in Figure 8.8. 
As a result, the problem of implementing the non-transversal T gate fault-tolerantly is 
reduced to the problem of preparing (in advance) high-fidelity magic states. 


T Gates in Quantum Algorithms 

To put the cost of QEC in perspective, we examine the overhead of non-transversal operations 
such as S and T gates. S and T gates are important operations in many useful quantum algo- 
rithms, and their error-corrected execution requires magic state resources. When the number 
of T gates in an application is low, the circuit is in fact able to be efficiently simulated clas- 
sically [180]. T gates have been shown to comprise between 25% and 30% of the instruction 
stream of useful quantum applications [343]. Others claim even higher percentages for specific 
application sets, of between 4096 and 4796 [344]. 
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For an estimate of the total number of required T gates in these applications, take as an 
example the algorithm to estimate the molecular ground state energy of the molecule Fe2S2. It 
requires approximately 10* iteration steps for “sufficient” accuracy, each comprised of 7.4 x 10° 
rotations [345]. Each of these controlled rotations can be decomposed to sufficient accuracy 
using approximately 50 T gates per rotation [215]. All of this combines to yield a total number 
of T gates of order 10'*. As a result, it is crucial to optimize for the resource overhead required 
by the execution of T gates at this scale to ensure the successful execution of many important 
quantum algorithms. 


Bravyi-Haah Distillation Protocol 

The last piece in the problem is how to prepare the resource states fault-tolerantly. Distillation 
protocols are circuits that accept as input a number of potentially faulty raw magic states, use 
some ancillary qubits, and output a smaller number of higher fidelity magic states. 'Ihe input- 
output ratio, denoted as n — k, assesses the efficiency of a protocol. Below we focus on a popular, 
low-overhead distillation protocol known as the Bravyi-Haah distillation protocol [346]. 

To produce k magic states, Bravyi-Haah state distillation circuits take as input 3k + 8 
low-fidelity states, use k + 5 ancillary qubits, and k additional qubits for higher-fidelity output 
magic states, thus denoted as the 3k + 8 — k protocol. The total number of qubits involved in 
each of such circuit is then 5k + 13, which defines the area cost of the circuit module. 

The intuition behind the protocol is to "make good magic states from many bad ones." 
Given a number of low-fidelity states, the protocol uses a syndrome measurement technique to 
verify quality, and discards states that are bad. Then, the circuit will convert the subset of good 
states into a single qubit state. The output magic states will have a suppression of error, only 
if the filtering and conversion follows a particular pattern. This is specified by the parity-check 
matrix in the protocol. Notably, if the input (injected) states are characterized by error rate inject, 
the output state fidelity is improved with this procedure to (1 + 3k)e2..,. Due to the filtering 


inject" 


step, the success probability of the protocol is, to first order, given by 1 — (8 + 3k )€inject +*+. 


8.4 | SUMMARY AND OUTLOOK 


Quantum computation requires the mitigation of errors caused by imprecise controls and natural 
decoherence. Noise mitigation and error correction techniques are critical in making quantum 
computation practical, especially in the NISQ era. In this chapter, we introduced the leading 
techniques for characterizing and modeling noises, leading to better understanding of the ef- 
fects of quantum noise. We also described several physical-level noise mitigation techniques for 
reducing crosstalk and coherent errors. And finally, we illustrate the basics of quantum error 
correction codes which have the potential to fault-tolerantly execute arbitrary quantum compu- 
tation. 
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Further Reading 

At the physical level, two widely used noise mitigation techniques include composite 
pulses [298] (for systematic errors) and dynamical decoupling [347] (for coherent dephasing 
errors). Scalability remains a challenge in noise mitigation methods. A quantum system needs 
more intelligent methods for calibration, as it is infeasible to exhaustively calibrate each and ev- 
ery qubits as every coupling of two qubits. It remains an open problem to characterize realistic 
quantum noises [328, 348], and classically simulate them [349-351]. 

Quantum error correction is a beautiful mathematical proposal for scaling up quantum 
computation with fault-tolerance. The capstone of QEC is the theorem called “threshold theo- 
rem” [352]. It is one of the most remarkable results of QEC which states, at a high level, that 
if the physical error rate p is less than some threshold prn, then we can perform fault-tolerant 
quantum computation to accuracy e with only a moderate increase in circuit size (that is poly- 
logarithmic in the accuracy €) by concatenation of QEC, and given reasonable assumptions 
about the physical noise model. The details of the theorem is out of the scope of this review. We 
refer readers interested in QEC to [23, 25, 86]. In the near term, numerous efforts have been 
put in designing low-overhead quantum error correction codes adapted to different noise mod- 
els and program characteristics, as well as aiming to reduce the cost of magic state distillation 
protocols. 
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CHAPTER 9 


Classical Simulation of 
Quantum Computation 


Last but not least, we discuss the classical simulation of quantum computation, where we ex- 
plore the techniques for efficiently simulating quantum circuits on a classical computer. This is 
an important subject, in part because it allows us to execute a quantum program and verify its 
correctness even when no quantum hardware is available. It is thus an essential tool for testing 
and debugging quantum programs. But it also sheds light on the not-so-well-understood com- 
putational power boundary between classical computers and quantum computers. We would 
like to understand how much of quantum processes (if not all) can be efficiently simulated on 
a classical computer. In other words, understanding various classical simulation techniques can 
give us insights on what are the key ingredients in quantum computing that bring the advantage 
in computing power. For instance, is entanglement responsible for quantum speedup, or is there 
more to it? Those are the kind of questions raised and hopefully answered when we study the 
simulation of quantum computation. After defining what classical simulation means, we direct 
the reader’s attention to some leading techniques, namely simulation using density matrices, 
stabilizer formalism, and graphical models. 


9.1 STRONG VS. WEAK SIMULATION: AN OVERVIEW 


The aim of a classical simulation is to mimic the dynamics of a quantum systems, and accurately 
reproduce the outcomes of a quantum circuit. Recall from earlier chapters, for every execution 
of a quantum circuit on a quantum computer, we obtain a sample bit-string from a probability 
distribution resulting from the measurements at the end of its circuit. More specifically, given a 
quantum circuit U, with n input qubits and N number of gates. For an efficient quantum circuit, 
N is usually O(poly(n)). Upon measuring the qubits, we read out a sample bit-string, denoted 
as a € (0, 1}”, with probability: 


P(o) = | (|U |000.. . 0) |?. 
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Now, a quantum computer can report the result from measuring qubits at the end; we are inter- 
ested in doing it classically. In general, a classical computer have two options, namely to strongly 
or weakly simulate the resulting probability distribution. 


Definition 9.1 Strong simulation aims to calculate the probabilities of the output measurement 
outcomes efficiently with high accuracy using a classical computer. 


With strong simulation, a classical computer needs to show that it has reproduced the 
outcome of a quantum circuit by writing down the probabilities of some or all of the possible 
bit-strings. To verify, we can run the quantum computer multiple times and check if its sam- 
pling probability is close to the reported value by the classical computer. More specifically, we 
emphasize that there are typically two styles of strong simulations: 


1. “Evaluating all amplitudes”—we aim to calculate P (œ), Va and 


2. “Evaluating one amplitude"—we calculate the probability of one of the outcomes, e.g., 
P(0...0). 


With weak simulation, a classical computer performs as a sampling device, acting more 
purer p pans g 
closely as a quantum computer does. In particular, we ahve the following Definition. 


Definition 9.2 Weak simulation aims to sample once from the output distribution efficiently 
using a classical computer. 


Each time you simulate a circuit weakly, you will obtain an outcome o with probabil- 
ity according to a distribution close to the true probability distribution P (œ) had one done it 
quantumly. 

We remark that strong and weak simulations are fundamentally different notions. In other 
words, we can find some quantum circuits that are trivially simulable weakly, but are unlikely 
to be efficiently simulable strongly. For example, [353] shows that strong simulation of some 
quantum circuit is #P-complete. 

Furthermore, strong simulation implies weak simulation. The forward direction is simple: if 
the probabilities are calculated, then you can sample according to the probabilities. But if you 
can sample once in poly time and there are exponential possibilities, it is not immediately clear 
how to recover all amplitudes with accuracy. Techniques developed for classical simulation have 
been focusing on simulating quantum circuit strongly. However, weak simulation is closer to 
what we are interested in physically, because a quantum device produces a sample at a time upon 
measurements. Strong simulation, especially for evaluating all amplitudes, may after all be too 
harsh on the classical computers. 

Despite the common belief that classical simulation of universal quantum circuits does not 
scale well, efficient simulations for some restricted classes of circuits exists. One such example 
in this book is the Clifford circuits, which have been shown to be efficiently simulable using the 
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stabilizer formalism. These restricted classes also tell us what features in quantum circuits enable 
the quantum computational power. 


9.1.1 DISTANCE MEASURES 


To evaluate how well a classical simulation reproduced the probability distribution of a quan- 
tum circuit, we define some relevant distance measures for classical probability distributions. 
There are many different distances used for comparing two probability distributions p and q; 
we mostly follow the analysis in [354, 355]. Consider two discrete probability distributions 
p = (py... pa) and q = (qi...., qq) over the same space Q where |Q| = d. 


Definition 9.3 The zotal variation distance between p and q is defined as 


d 
1 1 
drv(p.q) = 5 ) Ipi — Gil = ll? — all. 
i=l 


The total variation distance, which takes value between 0 and 1, measures the worst probability 
discrepancy between a sample from p and a sample from q, i.e., dry (p,q) = maxxeg | Prp[x] — 
Pr, [x]|. 


Definition 9.4 The £5 distance between p and q is defined as 


1/2 


d 
de, (p,q) = (de - 1?) = |Ip — lle. 
i=1 


The £5 distance, which takes value between 0 and 4/2, is related to the total variation distance 
by de, (p. q) < 2dpv(p. q) < V'dde, (p,q). 


Definition 9.5 The Hellinger distance between p and q is defined as 


1/2 


d 
du(p.q) = (Xs " va) 


i=1 


The Hellinger distance, which take value between 0 and J2, is related to the total variation 
distance by dz (p. q) < 2drv(p, q) x 2dn(p. q). 

For completeness, we should note that many of the distance metrics above have a quantum 
analogue. The goal of distance measures for quantum mixed states is to quantify by how much 
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do the quantum probability distributions of the two quantum states differ—see Chapter 2 for a 
short review on quantum probability. 


Definition 9.6 The zrace distance between two mixed states p and o is defined as 


1 1 
Da(p.o) = zle — elh = je (Vo mto-0). 


The trace distance, which takes value between 0 and 1, can be viewed as the quantum analogue 
of the total variation distance, in that D, calculates the maximum probability that two states p 
and o can be discriminated by measurements. 


Definition 9.7 The Hiübert-Schmidt distance between p and o is defined as 


1 
Dus(p.c) = llo- olle = tr((p— o?) ^ , 


where ||- ||r is also called the Frobenius norm. The Hilbert-Schmidt distance is the quan- 
tum analogue of the £2 distance. It relates to the trace distance by Dys(p,0) < 2 D«(p.0) < 
Vd Dus(p, 0). 


Definition 9.8 The Bures distance between p and o is defined as 


Ds(o.c) = Q(1— F(p.o))!? , 


where F(p.o) = ||. /p-4/o||1 is the fidelity between the two mixed states p and o. The Bures 
distance is the quantum analogue of the Hellinger distance. It relates to the trace distance by 
D5 (p, 0) x 2D«(p.o) < 2Ds(p.c). 

With the distance metrics defined, we are going to introduce a number of simulation 
techniques. In particular, they are all strong simulation techniques. In fact, most simulations 
that has been developed are strong simulations, and it remains an exciting open problem to 
explore the possibility in the weak simulation of quantum circuits. 

The leading simulation techniques we choose to cover include: density matrix simulation, 
stabilizer formalism, tensor networks, and undirected graphical model. 


9.2 DENSITY MATRICES: THE SCHRÖDINGER PICTURE 


In the Schrödinger picture, evolution of quantum systems can be described by tracking its guan- 
tum state over time, denoted as a time-dependent ket vector |Y (t)} (or density matrix p(t)). 
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Through time, a quantum state is evolved to another by some unitary transformation U: 


time 


lv(0)) —> |v()) = U |y 0). 


As such, one of the most straightforward way for simulating quantum circuits is explicitly 
tracking the transformation of the qubits (in the form of state vector or density matrix) over 
time. More specifically, we can just compute the state vector or density matrix at each stage 
of the circuit, by multiplying it with the unitary matrix of the gate one at a time. As shown in 
Chapter 2, density matrix simulation is needed for quantum circuits with intermediate measure- 
ments (which transforms pure quantum states to mixed quantum states), because tracking the 
state vector over time is not enough for representing mixed state. Furthermore, noisy quantum 
circuits often need density matrix simulations. In fact, when we analyze the quantum algorithms 
in Chapter 3, we have been using this strategy to compute the output of the algorithm, acting 
as a strong simulator by tracking the quantum state over time. From this point forward in this 
section, for simplicity, we are mostly concerned with using state vectors to simulate a quantum 
circuit U strongly! For an n-qubit quantum circuit, suppose we start with |0...0) state and 
want to calculate the probability of an outcome x € (0, 1)": 


p(x) = | (x|U|0...0) [*. 


The circuit U is represented by m quantum gates, each acting on n qubits, so U = Um- U5U,. 
Each U; is a 2” x 2” matrix with complex entries. 

Let us analyze the space and time cost of naively applying the matrix multiplication. To 
begin with, the size of a state vector (i.e., dimension of a vector) grows exponentially with the 
number of qubits to 2”, so there are 2” number of complex amplitudes to keep track of. For a 
depth m circuit, a naive strategy needs to apply matrix multiplication to the state vector m times. 
If the unitary gate is a 2" x 2” matrix (i.e., a n-qubit gate), using naive matrix multiplication, we 
are performing about O(27”) multiplications of two complex amplitudes, and summing them 
row by row. So the naive algorithm requires a time cost of O(m2?"). Every iteration, we need 
space for 2?" complex numbers (stored with some floating point precision) and 2 - 2” numbers 
for the input and output state vector. Overall, the space cost is also O(m2?”). This is a daunting 
scaling, both in time and space, for a classical computer. Suppose we can efficiently synthesizing 
the n-qubit unitary to a sequence of one- and two-qubit gates of length L, one can reduce the 
space cost to O(L + 2") and time cost to O(L2"), still exponential in the number of qubits. 
Further optimizations such as data compression, distributed algorithm have been applied to 
reduce the cost on a classical computer [356, 357]. Due to this exponential scaling in space and 
time cost, simulations for more than 65 qubits are shown to be challenging, even on today's 
state-of-the-art supercomputers. 


Mn eneral, density matrices are necessary if intermediate measurements or noise are present in the target quantum 
, 8 
circuits. 
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Simulating Product States 

Let us now analyze a special case where the quantum states are product state throughout the 
circuits. As we will see in the following argument, if we are willing to forgo quantum entangle- 
ment, then a classical computer can simulate the quantum circuit efficiently. Suppose we have 
a product quantum state: 


IV) = |q1) 8 |42) ®...® lån), 


where each qubit carries two amplitudes, e.g., |g) = o |0) + £1 |1). There are only 2n number 
of amplitudes to keep track of, rather than all 2" of them. As for the unitary matrix, note that 
being in product state also means that if the gate is local you only need to update the local bit 
of the amplitudes. (To see why you only need to update locally, note that you are now storing 
the amplitudes in the above way with 2n complex numbers, instead of the original ket vector.) 
As such, we have seen that entanglement is a key resource for quantum computing power. It is 
important to note here, though, that it is not sufficient. Note that entanglement is not a sufficient 
condition, as there exists circuits that entangle all qubits, yet still are efficiently simulable on a 
classical computer [358—361]. 


Sum-Over-Path Approach 

Using the path integral technique introduced in Chapter 3, we will show that we can simulate 
a quantum circuit with polynomial space on a classical computer, at the cost of (possibly expo- 
nential) time complexity. In particular, in the context of computational complexity as defined in 
Chapter 3, the statement is equivalent to BQP € PSPACE (polynomial space). To begin with, 


we rewrite the matrix multiplication with the sum-over-paths technique: 





(x|U|0...0) = »» (| Urn ooa) Gea Uma Hon) Gn] O30) 
xj €{0,1}” ii e(1...m) 


In total, there are 2""7U number of paths to sum over, leading to a time cost of O(m2" "—), 
Due to the tree structure in the sum-over-path construction, we have to store nm log m number 
of complex numbers along the paths. It is important to not neglect the space cost for storing 
the unitaries of the quantum circuit. We again resort to the argument on efficient synthesis of 
the 2" x 2" sized U to a sequence of single- and two-qubit gates, so that (x;|U;|xi-1) in the 
summation can be efficiently computed by storing the four complex numbers in U;, requiring 
space polynomial in the precision of the complex number. So the space cost overall is reduced 
to a reasonable O (nm log m). This concludes our proof of BQP c PSPACE. 


9.3 STABILIZER FORMALISM: THE HEISENBERG 
PICTURE 


In the Heisenberg picture, evolution of a quantum system can be described by tracking its 
operators over time. Suppose we are consider an operator that is an observable A. In the old 
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Schrodinger picture, we calculate the expected value of the observable by conjugating the final 
state vector: (A) = (V (t)| A| V (t)). In contrast to the Schrödinger picture in which the state vec- 
tor changes over time, the Heisenberg picture is an equivalent formalism of quantum mechanics 
in which the state vector is kept constant at the initial value |y) = |y(0)), and the operator 
evolves over time: (A) = (y (0)|U* (t) AU(t)|W(0)) = (W(0)|A(t)|W(0)). In other words, we can 
track the evolution of a quantum system by its time-dependent operator A(t): 


AO) —> A(t) = U* A(QU. 

Classical simulation based on stabilizer formalism corresponds to the Heisenberg picture 
of quantum computation. We have seen from Chapter 8 the stabilizer formalism as useful for 
quantum error correction; now we shall see its application in classical simulation of a subclass 
of quantum circuit by the Gottesman—Knill theorem. Before we start our discussion, we want 
to emphasize that we are now considering efficient simulations of a restricted class of quantum 
circuits. When we say “efficient,” we mean O(poly(n)) space and time costs, where n is the 
number of qubits, and "restricted class" here means the stabilizer circuits (also known as the 
Clifford circuits). 

The key idea is to find a compact representation of a quantum state, together with an 
efficient update rule for transforming the quantum states. We will start by defining the class of 
circuits in consideration. 


Definition 9.9 A quantum gate is a stabilizer gate if it is generated from the Clifford group 
S = (CNOT, H, S). In other words, it is a product of g € S. 


For example, all Pauli gates belong to this set: X = HZH, Y = iXZ, Z = SS. Notice that 
a stabilizer gate S conjugates a gate from the Pauli group back to the Pauli group: SP; S? = Pj 
up to a phase factor, where P;, Pj € P. 


Definition 9.10 A state is a stabilizer state if it can be prepared from |00. . . 0) using stabilizer 
gates. 


For example, we can list the single-qubit stabilizer states (6 of them): 


l0) .[1) 1+) 12) I3) , I7), where |) = PET! and |i) = PHD, 








= 


Definition 9.11 A quantum circuit is called a stabilizer circuit if it is made of stabilizer gates 
applied on input state |00...0), and measurements in the computational basis. 


Definition 9.12 |y) is stabilized by a quantum circuit U, if U |y) = |y). 


We consider a few examples where the states are stabilized by the Pauli gates. 


e I stabilizes everything. J |y} = |W). 
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. ili — y(10-*lD — 020) _ |, 
X stabilizes |--). X |+) = X( 75 j= a= |+). 





* — |Z) stabilizes |1). —Z |l) = —(—|1)) =|1). 
* X & I= XI stabilizes |+) & |0), |+) @|+),.... 


In order to uniquely represent a state, say |V) = |+) ® |0), we can use its stabilizer(s). In 
the example |y}, we have already found it to be stabilized by XI. Since there are multiple states 
that are stabilized by X/, we ought to find its other stabilizers in order to avoid ambiguity. In other 
words, we want to find a set S, such that |y) is uniquely determined when s € S simultaneous 
stabilize |y). 

In the example |y) = |+) & 0), this set is (I, XI, IZ, XZ}. Note that 11 stabilizers all 
two-qubit state, and XZ is the product of XI and IZ. So essentially, to uniquely represent our 
|+) & |0) state, we only need to keep track of the ¢wo stabilizers XI and IZ. Remarkably, the 
Gottesman-Knill theorem says that the number of stabilizers we need to keep track of is only 
O(n). 


Theorem 9.13 — Gottesman-Knill theorem /362/ states that there exists classical algorithm that 
simulates any stabilizer circuit in polynomial time. 


In simulation, we do not need to keep track of the amplitudes of state vector anymore; 
rather we can keep track of the stabilizer operators. Let us now examine how to update the 
stabilizer group when applying a quantum gate: 


1@H 
I+) 8 |0) — I+) 8 |+). 
We can find the set of operators that simultaneously stabilizes the initial state |+) |0) and the 


final state |+) |+), respectively: 


I@H 
UL, XI, IZ, XZ} — > CI, XI, IX, XX}. 


More compactly, we can write only the generators (i.e., the minimal subset such that every 


à . 1@H 
element in the set can be obtained from product of the generators): (XI, IZ) ——> (XI, IX). 
In general, we can track how the quantum systems evolve over time by updating its sta- 
bilizer operators: 


U 
S — usu. 
For convenience, we list the update rules for some common Clifford gates: 
* Hgat X > Z, Z > X: 


* Sgate: X > Y, Z > Z; and 
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Figure 9.1: Graphical representation of tensors and their mathematical definitions. 


* CNOT gate: XI > XX, IX — IX, ZI > ZI, IZ > ZZ 


To complete the explanation of the theorem, we need a few other ingredients [363]: 
(i) proving that the size of stabilizer generators scales linearly with the number of qubits; 
(ii) demonstrating that measurement is efficient; and (iii) showing that the amplitude 
(x|U|00...0) can be computed efficiently. 

Indeed, one can use the tableau representation to accomplish all of the above-mentioned 
tasks. The tableau representation is an (£ x 2n) matrix storing information about the stabilizer 
generators, where the number of stabilizer £ € O(n). One can also show a O(poly(n)) time 
procedure to update the tableau for measurements in the computational basis. Details of the 
proof can be found in [363]. 


9.4 GRAPHICAL MODELS AND TENSOR NETWORK 


A different class of simulation techniques is developed based on graphical models. By converting 
a quantum circuit into a graphical representation, one's hope is to more efficiently perform the 


transformation |W (0)) = |w(t)). Let us begin with a technique called tensor network simulation. 

A rank-k tensor is a k-dimensional matrix. A rank-k tensor is a mathematical object 
where an entry in the object is located by k indices. Note that in the graphical representation, a 
tensor is a vertex, and the rank of the tensor is represented as the number of edges connecting to 
the vertex. In other words, each edge represent an index. We can label the edges with the name 
of the index. For example, a rank-O tensor is just a scalar; a rank-1 tensor is a vector (indexed 
by the position in the vector); a rank-2 tensor is a matrix (indexed by row number and column 
number); a rank-3 tensor is some data structure that is indexed by 3 numbers, e.g., a movie ticket 
(indexed by screen number, row number, seat number). In the context of quantum computation, 
we can make the following equivalence in Figure 9.1. 

Now we illustrate how to transform a quantum circuit into a graph of tensors. To map 
from a quantum circuit to a Zensor network, we highlight the following correspondence. 


e Qubit state: vector > 1-d tensor. 


* Single-qubit gate: 2 x 2 matrix (i.e., qubit input index (column) and qubit output index 
(row)) — 2-d tensor. 
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Figure 9.2: Converting from a quantum circuit to a tensor network. 

















e Two-qubit gate: instead of a 4 x 4 matrix, we can index an entry by 4 indices, namely 
the qubit 1 input, the qubit 1 output, the qubit 2 input, and the qubit 2 output — 4-d 
tensor. 


Therefore, we can straightforwardly convert a quantum circuit into a network of tensors 
of various rank, as shown in Figure 9.2. 

Furthermore, the final simulation output (y |U|00...0) is computed by the product of 
the final quantum state (y|, the quantum circuit U, and the initial state |00...0). In terms of 
the tensor network, the final product is represented by sticking rank-1 tensors at the beginning 
and end of the network derived from Figure 9.2. 


Tensor Network Contraction 
Tensor contraction is a process where we merge two tensors into one, absorbing the common 
edges (i.e., the common indices) between the two tensors. When two tensors share a common 
index, we can contract the corresponding edge by summing over all possible values of that index. 
Indeed, tensor network contraction can be thought of as the multi-dimensional generalization 
of matrix multiplication. 

Given two matrices (i.e, rank-2 tensors) A and B, we contract the two tensors using the 
definition of matrix multiplication. Let us denote Al as ith row, jth col of matrix A; similarly 
for B. Now, we can contract the edge j: 
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Figure 9.3: Contracting two rank-2 tensors, A and B, is equivalent to the matrix multiplication 
C = AB. 
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Figure 9.4: Part of a generic tensor network, consisting of ten rank-4 tensors and four rank-1 
tensors. 


YAR cg 
J 
Note that the end result is another matrix C indexed by i and k. 


Complexity of Contraction 
Contracting one edge take O(exp(d)) time where d is the max rank of tensors involved. To 
see this, we notice that in our case, each index can take values 0 or 1, so contracting the edge 
corresponding to that index will yield a summation with two terms. Now consider the case where 
two tensors are connected by k edges, combining these two tensors would mean summing over 
d different indices each of which can take two values. So the summation has 27 terms in total. 
Normally, in simulation of quantum circuit, the goalis to contract a tensor network into a 
single rank-0 tensor. Consider part of a tensor network shown in Figure 9.4. The first problem 
we encounter is deciding the order of edges to contract. Due to the structure of the network, 
some ordering may have lower overall cost than that of the other. 
Imagine one could contract some tensors in parallel, it is therefore the maximum rank of 
tensors you encounter during the process of contraction that determines the complexity. So, can 
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Figure 9.5: First strategy of contraction that results in two rank-12 tensors and four rank-1 ten- 


sors. Then contracting the two rank-12 tensors involves contracting 5 edges at once, by summing 
over 2° terms. 






































Figure 9.6: Second strategy of contraction that results in five rank-6 tensors and four rank-1 
tensors. Then contracting the five rank-6 tensors involves contracting from left to right 2 edges 
at a time, by summing over 2? terms four times. 


we avoid encountering large rank tensor by contracting the given graph cleverly? The answer is 
yes. To see this, consider the following two different contraction strategies in Figures 9.5 and 9.6. 

Observe that the first strategy in Figure 9.5 has max-rank 5, while the second strategy 
in Figure 9.6 has max-rank 2; the two strategies differ in their contraction order. It is therefore 
important to specify a strategic contraction order that yields low max-rank. 

There are a number of techniques that can help keeping the rank low [364]. For instance, 
splitting a tensor into a number of smaller ones (e.g., via singular value decomposition) allows 
for more degrees of freedom in the contraction. One can also make approximations by dropping 
some indices regarded as unimportant. 
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Figure 9.7: Converting from a quantum circuit, to a tensor network, then to an undirected 


graphical model. Note on bottom-right panel is the reduced graph using a technique called 
variable elimination. 


Undirected Graphical Model 
One of the ways to further reduce the cost of contracting a tensor network is by the undirected 
graphical model [365]. In essence, it combines the ideas from the tensor network and Feynman’s 
sum-over-path approach, as seen from Chapter 3. 

We start by examine an example undirected graphical model, derived from a quantum 
circuit, as shown in Figure 9.7. 

Observe that CZ gate is diagonal, suggesting that the rank-4 tensor for CZ gate is redun- 
dant. We can further simplify the graph by writing down the path sums for the circuit. Suppose 
we want to calculate the amplitude for the outcome |11): 


(11|U|00) = X` (lH e His js) (is j3|CZlizjz) (i2 jo|H & H|00) 


13,J3,12,J2 


= 2. 01|H & A |i» j2) (i2 j2|CZ]i2 jo) (i2 j2] H & H|00) . 
i2,j2 
Note that (i3/3|CZ|i2 j2) is non-zero if and only if i3 j3 = i2 j2. Therefore, we can simplify the 
graph by identifying diagonal gates and replace them with the corresponding components in 
Figure 9.8. This technique, in many cases, can drastically reduce the number of indices we need 
to sum over, and thus yield a more efficient simulation of the same circuit. 


95 SUMMARY AND OUTLOOK 


Simulators have always played critical roles in systems designs. For classical computers, we often 
times boostrap from simulators to hardware, and from hardware to larger hardware. Simulation 
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Figure 9.8: In the undirected graphical model, diagonal gates have simplified graph components 
with fewer indices to sum over. 


can model the trajectory of an ideal computation as well as the effects of noise. It remains a 
fundamental challenge to implement (most likely partial) simulation of quantum processes ef- 
ficiently and scalably. 


Further Reading 
Simulating realistic models of noise is non-trivial. Advanced techniques have been developed to 
balance between simulability and efficiency. For instance, [350, 366] extends noise simulation 
from using Pauli channels to Clifford channels. [367, 368] apply quasi-probability approxima- 
tions. 

Similarly, for circuit simulations, methods [180, 369] have been developed that extends 
the Gottesman-Knill theorem to cover more ground (i.e., Clifford and a small number of T 
gates) at the cost of increased complexity. 
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CHAPTER 10 


Concluding Remarks 


The idea of using quantum mechanics, the laws that govern all fundamental particles in the uni- 
verse, to process information has revolutionized the theory of computing. And soon, nearly half 
of a century after its first proposal, a practical quantum computer may finally be built. Quantum 
machines may soon be capable of performing calculations in chemistry, physics, and other fields 
that are extremely difficult or even impossible for today's conventional computers. Yet a signif- 
icant gap exists between the theory of quantum algorithms and the devices that will support 
them. Architects and systems researchers are needed to fill this gap, designing machines and 
software tools that will efficiently map quantum applications to the constraints of real physical 
machines. 

In this book, we put most emphasis on the design of NISQ computer systems. But it 
is important not to lose sight of the long-term goal of realizing large-scale fault-tolerant (FT) 
quantum computers. Optimizing for NISQ systems may appear overwhelming; some even argue 
that managing N qubits requires precise control over O(2% ) continuous variables (i.e., complex 
amplitudes). However, that is not the case for the following two reasons. (i) Nature is kind 
enough to let us vary O(2") amplitudes via O(4N) knobs, that is 7, X, Y, Z controls for each 
qubit. The linear number of control parameters is much more manageable than an exponential 
one. (ii) These knobs can be digital once we are beyond the NISQ era. The lessons we learn for 
physically driving qubits with analog pulses will pave the way for scalable fault-tolerant systems 
in the future. 'The theory of QEC beautifully discretizes noise and protects programs against 
errors. There is so far no fundamental reason to believe that such FT machines cannot be built. 


Quantum Computers can be Digital 

It is tempting to view quantum computing as an analog enterprise, with its exponentially com- 
plex superposition and probabilistic outcomes of measurement. Remarkably, most quantum 
computer designs follow a digital discipline (other than the D-Wave quantum annealer, which 
itself uses a fixed set of control values during its analog annealing process). In particular, this 
is accomplished through quantum error correction codes and the measurement of error syndromes 
through ancilla qubits (scratch qubits). Imagine a three-qubit quantum majority code in which 
a logical "0" is encoded as “000” and a logical “1” is encoded as “111.” Just as with a classical 
majority code, a single bit-flip error can be corrected by restoring to the majority value. Unlike 
a classical code, however, we cannot directly measure the qubits, or their quantum state will be 
destroyed. Although the errors to the qubits are actually continuous, the effect of measuring the 
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ancilla in syndrome measurements is to discretize the errors, as well as inform us whether an er- 
ror occurred so that it can be corrected. With this methodology, quantum states are restored in 
a modular way for even a large quantum computer. Furthermore, operations on error-corrected 
qubits can be viewed as digital rather than analog, and only a small number of universal opera- 
tions are needed for universal quantum computation. The standard set consists of the Hadamard 
gate (H), 2/8-phase gate (T), and controlled- NOT gate (CNOT). Through careful design and 
engineering, error correction codes and this small set of precise operations will lead to machines 
that could support practical quantum computation. 

It is true that quantum error correction codes have historically required enormous over- 
head that would be impractical for the foreseeable future. However, this overhead is constantly 
being reduced through improvements in error-corrected gate methods and reduction in the 
physical error rates. It is expected that qubit optimized instances of the surface code [22] will 
demonstrate quantum error correction with 20 qubits in the near term. Surface codes are topo- 
logical codes that are under-constrained (they also only require near-neighbor 2D qubit con- 
nectivity). They are under-constrained in that error syndromes do not uniquely determine the 
pattern of physical errors that actually occurred. Instead, a maximum likelihood calculation must 
be computed offline to determine which errors to correct. Intuitively, this under-constrained 
coding allows more errors to be corrected by fewer physical qubits. 


Effective Error- Mitigation Techniques 

High physical error rates in quantum devices can lead to high error-correction overhead, even 
for surface codes. Physical error-mitigation techniques, however, promise to make devices more 
reliable and make low-overhead error-correction codes possible. These error-mitigation tech- 
niques rely on examining the physical basis of the error. For most solid-state systems, the qubit 
is designed from more primitive physical elements. An active area of research is combining noisy 
qubits to generate less noisy qubits at the physical level. One example is a proposed four-element 
superconducting ensemble [370], in which noise continuously transfers from two transmons to 
two resonant cavities. The noise is then removed from the cavities through a combination of con- 
trol pulses and dissipation. It is expected that this technique can improve the effective logical 
qubit lifetimes against photon losses and dephasing error by a factor of more than 40. Another 
method for physical error mitigation is to use some qubits not for computation but to control 
the noise source. Trapped-ion machines rely on the shared ion motion to two-qubit gates. A 
"cooling ion" of a different species can be used to remove noise in the motion to improve the 
gates between data ions. 

In addition to enabling more scalable error-corrected quantum computing, error- 
mitigation techniques will likely be effective enough to allow small machines of 100-1000 qubits 
to run some applications without error correction. Additionally, some application-level error- 
correction is possible. For example, an encoding of quantum chemistry problems, call Gener- 
alized Superfast [371], can correct for a single qubit error. This encoding is one of the most 
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efficient discretizations of fermionic quantum simulation, and thus its error-correction proper- 
ties come at essentially no extra overhead. 

Overall, the outlook for quantum computation is promising due to the combination of 
digital modular design, error-correction, error-mitigation, and application-level fault-tolerance. 
Hardware continues to scale in performance, where the most recent example is the ion-trap based 
quantum computer by IonQ demonstrating limited gate operation on up to 79 qubits [56] with 
fully connected, high fidelity entangling operations on 11 qubits at a time. Additionally, algo- 
rithmic and compiler optimizations show signs of orders-of-magnitude reductions in resources 
required by quantum applications in terms of qubits, operations, and reliability. 


Achieving Greater Efficiency by Breaking Abstractions 

Practical quantum computation may be achievable in the next few years, but applications will 
need to be error tolerant and make the best use of a relatively small number of quantum bits and 
operations. Compilation tools will play a critical role in achieving these goals, but they will have 
to break traditional abstractions and be customized for machine and device characteristics in a 
manner never before seen in classical computing. Following the overarching theme of achieving 
greater efficiency by breaking abstractions, we make the point once again using the following 
three typical examples from Chapter 6. 


* First, compilers can target not only a specific program input and machine size, but 
the condition of each qubit and link between qubits on a particular day! Several re- 
cent quantum circuit mapping and scheduling techniques use heuristics to optimize 
for specific program input, physical machine size and physical topology. Here, we ex- 
pose device constraints to the mapper and scheduler in the compiler. 


* Second, instead of compiling to an instruction set, compilers can directly target a set of 
analog control pulses. When we aggregate the instructions, a much more efficient over- 
all pulse can usually be found. In this case, we connect the instruction set architecture 
with the hardware characteristics. 


e Third, instead of using binary logic to target two-level qubits, compilers can target an 
n-ary logic composed of qudits. By strategically occupying the (generally more noisy) 
third energy state, we can sometimes significantly shorten critical paths of computation 
and improve the overall success rate. This is an example where we combine high-level 
logic (algorithm) design with compiler optimizations. 


In Need for a New Systems Stack 

Although there already exists an ecosystem of layered quantum software tools and abstractions 
that serve as an interface between those layers, it is perhaps premature and fallacious to follow a 
model too similar to classical software. Quantum computing is at a similar stage of development 
as classical computing in the 1950s. This is actually exciting because there are so many interesting 
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problems to be solved. Specifically, resources are very scarce and we are motivated to break 
abstractions and pay for efficiency with greater software complexity. How much of what we 
learn in the next five years will carry forward to a future of much larger quantum machines? 
Perhaps more than we might think, as it would be hard to imagine a future in which qubits and 
quantum operations are not costly. Some physical details may always be exposed. Even classical 
computing is regressing slightly toward less abstraction as device variability increases and the 
end of Dennard scaling puts pressure on architectures to become more energy efficient. 

Quantum computing hardware continues to develop at an impressive pace. Pushing the 
limits of engineering and technology, these machines will most certainly require software adap- 
tation to hardware constraints such as device variation, operating errors, and environmental 
noise. To enable practical quantum computation, systems designers will take the responsibility 
to efficiently map high-level quantum algorithms to resource-constrained quantum machines, 
which requires optimizations at every layer of the systems stack. This includes the design of (i) a 
digital modular architecture with feasible error-correction, error-mitigation and application- 
level fault-tolerance, (ii) an expressive programming language that allows for scalable program- 
ming, (iii) an automated compilation and memory management framework that optimizes over 
algorithm-specific and hardware-specific constraints, (iv) an integrated quantum-classical co- 
processing scheme that enables efficient execution of hybrid algorithms, and (v) scalable soft- 
ware and hardware verification that gives us confidence on the programs we write and machines 
we build. 

Once we have functional quantum computers, we may even be able to use quantum algo- 
rithms to implement theorem provers and constraint solvers. Yet, we will always be bootstrap- 
ping from simulator to hardware, from hardware to larger hardware. This is, of course, similar 
to our experience with classical microprocessors, but perhaps more challenging since each qubit 
we add to future machines makes the verification and simulation problem exponentially more 
difficult for current machines. 
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